








Psychometrika 














: A JOURNAL DEVOTED TO THE DEVEL. 
4 OPMENT OF PSYCHOLOGY AS A 
" QUANTITATIVE RATIONAL SCIENCE 









































THE PSYCHOMETRIC SOCIETY ORGANIZED IN 1935 
















UME 16 
MMe eR g 


951 












&* 


PSYCHOMETRIKA, the official journal of the Psychometric Society, is devoted 
the development of psychology as a quantitative rational science. Issued four 
times a year, on March 15, June 15, September 15, and December 15 


JUNE 1951 VoLUME 16, NUMBER 2 


Printed for the Psychometric Society at 23 West Colorado Avenue, Colorac 
Springs, Colorado. Entered as second class matter, September 17, 1940, at 
Post Office of Colorado Springs, Colorado, under the act of March 3, 1879. 
torial Office, Department of Psychology, The University of North olin: 
Chapel Hill, North Carolina. 


Subscription Price: The regular subscription rate is $10.00 per volume. The 
scriber receives each issue as it comes out, and a second complete set for binc 
at the end of the year. All annual subscriptions start with the March issue” 
cover the calendar year. All back issues are available. The price is ee 
issue or $5.00 per volume (one set only). Members of the Psychometric & 

pay annual dues of $5.00, of which $4.50 is in payment: of a subscription 
Psychometrika. Student members cf he Psychometric Society pay annual 

of $8.00, of which $2.70 is in payment ror the journal. ee 


Application for membership and student membership in the Psychometric Soci 
together with a check for dues for the calendar year in which application 
made, should be sent to 


RayMonpD A. KATZELL, Chairman of the Membership Committee 
Psychological Services Center, Syracuse University, Syracuse 10, N 
York 


Payments: All bills and orders are payable in advance. Checks covering ™ 
bership dues should be made payable to the Psychometric Society. Checks coy 

ing regular subscription to Psychometrika and back issue orders should be 
payable to the Psychometric Corporation, All checks, notices of change of ad- ~ 
dress, and business communications should be addressed to e 


Rosert L. THORNDIKE, Treasurer, Psychometric Society and Psychon 
Corporation 

Teachers College, Columbia University 

New York 27, New York 


Articles on the following subjects are published in Psychometrika: 


(1) the development of quantitative rationale for the solution of fg 
cal problems; 

(2) general theoretical articles on quantitative methodology in the social 
biological sciences; 

(3) new mathematical and statistical techniques for the evaluation of 
chological data; 

(4) aids in the application of statistical techniques, such as nomog 
tables, work-sheet layouts, forms, and apparatus; : 

(5) critiques or reviews of significant studies involving the use of qu 
tive techniques. 


The emphasis is to be placed on articles of type (1), in so far as articles of 
type are available. 


In the selection of the articles to be printed in Psychometrika, an effort is m 
to obtain objectivity of choice. All manuscripts are received by one person, 


(Continued on the back inside cover page) 


“s 














Psychometrika 





CONTENTS 


A GENERAL SOLUTION FOR THE LATENT CLASS 
MODEL OF LATENT STRUCTURE ANALYSIS 
BERT F. GREEN, JR. 


TIME-LIMIT TESTS: ESTIMATING THEIR RELIABIL- 
ITY AND DEGREE OF SPEEDING - - - 
LEE J. CRONBACH and W. G. WARRINGTON ; 


OPTIMAL TEST LENGTH FOR MAXIMUM BATTERY 
VALIDITY - - - - - = = = = = 
PAUL HORST 


REMARKS ON THE METHOD OF PAIRED COMPARI- 
SONS: II. THE EFFECT OF AN ABERRANT 
STANDARD DEVIATION WHEN EQUAL STAND- 
ARD DEVIATIONS AND EQUAL CORRELA- 
TIONS ARE ASSUMED - - - - - = = 

FREDERICK MOSTELLER 


REMARKS ON THE METHOD OF PAIRED COMPARI- 
SONS: III. A TEST OF SIGNIFICANCE FOR 
PAIRED COMPARISONS WHEN EQUAL STAND- 
ARD DEVIATIONS AND EQUAL CORRELATIONS 
ARE ASSUMED - - - - - = = = = = 

FREDERICK MOSTELLER 


RATE OF ADDITION AS A FUNCTION OF DIFFICULTY 
ANDAGE - - - - - = = = = = = = 
JAMES E. BIRREN and JACK BOTWINICK 


A MECHANICAL MODEL ILLUSTRATING THE SCATTER 
DIAGRAM WITH OBLIQUE TEST VECTORS - - 
HAROLD GULLIKSEN and LEDYARD R TUCKER 


(Continued) 








VOLUME SIXTEEN JUNE 1951 NUMBER TWO 





A GRAPHICAL METHOD FOR THE RAPID CALCULATION 
OF BISERIAL AND POINT BISERIAL CORRELA- 
TION IN TEST RESEARCH - - - - - - = 

HOWARD W. GOHEEN and MELVIN D. DAVIDOFF 


GEORGE KINGSLEY ZIPF, Human Behavior and the Prin- 
ciple of Least Effort - - - - - - = © = 
A Review by DAVID A. GRANT 


ALPHONSE CHAPANIS, WENDELL R. GARNER, AND 
CLIFFORD T. MORGAN, Applied Experimental Psy- 
chology - - - = = = © ©#© #© 2 = = 

A Review by ROBERT L. CHAPMAN 


BOOKS RECEIVED - - - - = = = = = = = 





239 


243 


244 


245 











PSYCHOMETRIKA—VOL. 16, NO. 2 
JUNE, 1951 


A GENERAL SOLUTION FOR THE LATENT CLASS MODEL OF 
LATENT STRUCTURE ANALYSIS 


BERT F. GREEN, JR. 
EDUCATIONAL TESTING SERVICE 
AND 
PRINCETON UNIVERSITY 


For the point distribution model of Lazarsfeld’s latent struc- 
ture analysis, the general matrix equation is stated which relates 
the manifest data in the form of joint occurrence matrices to the 
latent parameters. The relationship of the item responses and these 
joint occurrence matrices is also indicated in matrix form. A gen- 
eral solution for the latent parameters is then presented, which is 
based on the notion of factoring two joint occurrence matrices, The 
solution is valid under certain conditions which will usually be ful- 
filled. The solution assumes that estimates are available for the ele- 
ments in the joint occurrence matrices with recurring subscripts, 
analogous to item communality or reliability. Some alternative 
methods of obtaining these estimates are discussed. Finally a ficti- 
tious 8-class, 8-item example is presented in detail. 


Introduction 


An important contribution to the theory of attitude measure- 
ment has recently been presented by Lazarsfeld* as part of a report 
of research in attitude measurement conducted for the Research 
Branch of the Information and Education Division of the Army 
Service Forces. Latent structure, as this innovation is called, is es- 
sentially a mathematical model for describing the interrelationships 
of items in an attitude questionnaire. 

There are actually two latent structure models, one in which the 
underlying attitude variable and the item distributions are assumed 
to be continuous, and another in which the underlying attitude is 
assumed to have a point distribution. In the latter model, the indi- 
viduals at a given point in the distribution are termed, collectively, 
a latent class. Both models deal with items having two response al- 
ternatives. If an item has more than two response categories, these 
may be combined in such a way that a response dichotomy is avail- 

*Lazarsfeld, Paul. The logical and mathematical foundation of latent struc- 
ture analysis. Stouffer, S., et al., Studies in social psychology in World War 


IIT, Vol. IV. Measurement and Prediction, Princeton: Princeton University 
Press, 1950. 


151 











152 PSYCHOMETRIKA 


able. One of the two alternative responses is arbitrarily designated 
as the positive response to the item. Latent structure analysis is con- 
cerned with the joint occurrence of such positive responses to items. 

Lazarsfeld distinguishes between the manifest, or observable 
data, and the latent, or derived parameters of the model. The actual 
solution of the equations relating the manifest data to the latent 
parameters has been found only for certain special cases. Very little 
has been done for the continuous distribution model, and solutions 
have been reported only for special cases of the latent class model.* 
It is the purpose of the present article to present a general solution 
for the latent parameters for the latent class model, under a few con- 
ditions which will often be fulfilled. 


The Latent Structure Equations 
In the latent class model, each of the m points or latent classes 
is characterized by n; , the proportion of people in latent class s , and 
vi,, the probability that a person in latent class s will respond posi- 
tively to item. If p; is the proportion of people who respond posi- 
tively to item 7, we may write 


m 


Pi=ZM Vis. (1) 
The fundamental hypothesis of latent structure analysis is that all 
item interrelationships may be completely explained by the mutual 
relationship of all items to the underlying attitude distribution. In 
the latent class model this means that the items are assumed to be 
independent for each class s, while the item intercorrelations are 
assumed to be due to the varying latent item probabilities in the dif- 
ferent latent classes. Then, if pi; is the proportion of people who 
respond positively to both items 7 and 7, or the relative number of 
joint occurrences of positive responses to items i and 7, we have 


Diji = a Nz Vis Vis . (2) 


8=1 


In general, if :;... is the proportion of people who respond positively 
to all of the set of itemsi,7,---k, 


*In connection with a RAND project at Columbia University, Mr. W. A. 
Gibson has devised a graphical method of obtaining approximate solutions if 
certain configurational criteria are met, while Dr. Lazarsfeld and Mr. J. Dudman 
have solved special cases by the use of asymmetric determinants. 




















BERT F. GREEN, JR. 


m 


Dij..k — Dy Ns Vis Vis *** Vis. (3) 
8=1 


Note finally that 
Im=1. (4) 


Equations (1), (2), (3), and (4) express the manifest data as func- 
tions of the latent parameters n, and v;,. 

These equations may be written in matrix form. Following 
Lazarsfeld, with a few changes in notation, we define: 


m =number of latent classes, 


s =subscript designating latent class s;s—1,2,---m, 
” =number of items, 
4  =subscript designating item i;i—0,1,2,--- 7, where 


4 = 0 has a special meaning defined in each case below, 

o =subscript denoting any subset of items,i,7,---k, 

N =m X m diagonal matrix, elements n, , 

L =(r+41) X m matrix, elements v;,; v., = 1, and 

P, =(r +1) X (r + 1) symmetric matrix, elements p;;; 
Poi = Di, Doo — 1. 


(Let pi; remain undefined for the moment.) 





> hm Ge Bm caw *& 
Pi Pu Diz Dis «. + Dir 
P2 Der Peo Doz «++ Dor 
Piz Ps Psi Dse DPss «~~ Dar 
Pr Dir Dor Dor . - - Drr | 











D, =m X m diagonal matrix, elements v;,, & fixed; i.e., the 
(k+1)th row of LZ is made into a diagonal matrix. 

P, =(r+ 1) X (r + 1) symmetric matrix, elements 9:;,, k 
fixed ; Dojk = Din » Pook = Dr « 

Py, = (r +1) X (r + 1) symmetric matrix, elements pijm; Kk, 
h fixed ; Dojxn = Dikh » Pookh = Dra - 


Equations (1) to (4) are equivalent to 











154 PSYCHOMETRIKA 


P,=LNL. (5) 
Pr=LND L'. (6) 
Pin = LN D, Di a. (7) 


In general, we may write 


¢o=LND.L' whereDo.=TII D,,g ranging over the items in 
(9) 


¢,andD,=1. @) 
Now define 
A=LNi, (9) 
so (8) becomes 
Ps=ADA'. (10) 


Since v., = 1, it follows from (9) that do. = \/n,, while in general 
Gis = Vie VN. Thus it is sufficient to determine A, since from A, 
N and L are readily calculable. 


Two further matrices will be needed. Define 
Day => Dx. (11) 
kel 


The ith diagonal element in D,,) is the sum of all the elements but 
Vos in the sth column of L. Also define 


Puy =E P= EAD A =A (ED) A =A DY A’ (12) 
k=1 k=1 k=1 


The matrix P,,, contains all the triple order joint occurrence pro- 
portions p;;, in symmetric fashion. The use of such a matrix sym- 
metric in all the items was suggested by Dr. Paul Horst. 

The matrix equations imply manifest terms of the type pis , Piss » 
Piii, etc., terms in which the same item appears more than once in 
the subscript. We have called these terms the elements with recur- 
ring subscripts. These elements are actually not observable, but, for 
consistency, are considered to be defined by the general matrix equa- 
tion (10). In the present solution it is assumed that estimates are 
available for these elements with recurring subscripts. Some meth- 
ods of obtaining these estimates are discussed below. 

Since the joint occurrence matrices form the basis for the so- 
lution for A, it is of theoretical and practical interest to state in 








BERT F. GREEN, JR. 155 


matrix notation the relationship of the basic item response data 
to these matrices. We define 

X = (r+ 1) X n matrix, elements Zia. 
= 1 if individual a responds positively to item 7. 
= 0 otherwise. 
Zee —1foralla. 


Lia 


U;, = n X n diagonal matrix, elements 2%«, k fixed; i.e, a 
row of X is made into a diagonal matrix. 


Ua => U;. 
k=1 


The ath diagonal element in U,,), denoted wa), is the number of 
items to which individual a responds positively. Now let 


nP, =X X’; (13) 
nP, =X U,X'. (14) 


In general, 


nPo=X Uc X', where Uc=— TI U,, g ranging over the 
‘9) 15 
items ino, and U,. =I. _ 


a 


P. is equivalent to Po except for the elements with recurring sub- 
scripts. Also let 


nPq) = 23 Py= EX Uy X'=X/ > Ui pe 2 ee (16) 
kel kel k=1 


Finally, we have 
P.=P,+ £., (17) 
where E, is an (r + 1) X (vr + 1) diagonal correction matrix with 
diagonal elements 
Cii(o) — Dit — Di; 
@00(0) = 0 > 
Also, 


Pay =Pat+ Eq, (18) 


where 








156 PSYCHOMETRIKA 


E.,) is a correction matrix of order (r + 1) X (r+ 1), with 
elements €ij..) = piij + Diz3 —2Diy, (4 FJ); 
¢ a A A 
Cis) =( S Dix oH ’ (m6,, is a diagonal element of P,.) ) ; 
k=1 
Coitay) — Dii — Dis 
€o0(1) — 0. 
From equation (16) it is evident that the element in the ith row and 


A 


jth column of P,,) is the sum of the we) for those people who respond 
positively to both items i and 7. This can readily be obtained with 


an IBM sorter and tabulator. 


The Solution for the Latent Parameters 

The conditions under which the present solution is valid are that 
m< r+ 1, that A be non-singular, and that all the diagonal ele- 
ments of D,,, be different and non-zero. That is, A must have no 
fewer rows than columns, the columns of A must be linearly inde- 
pendent, and the sums of the latent parameters for the different 
latent classes must be different and non-zero. In addition, as we have 
said, it is necessary to know the elements with recurring: subscripts. 

Under these conditions it follows from equations (10) and (12) 
that P, and P,,) are of rank m. We may now operate on these mat- 
rices in the following manner. 


From (10) 
P,=AA’. (19) 
By any factor method, as Thurstone’s or Hotelling’s, we may obtain 
P,=BB. (20) 
From (19) and (20) and since P, is of rank m , we must have 
B A, =A, where A; is orthogonal . (21) 
Again by any factor method, we obtain 
Pa =CC. (22) 
From (12) 
Pay =A Day A'= (A Day) (A Day)’. (23) 


From (22) and (23), and since P,,) is of rank m, we must have 


CA-=AD,)? where A; is orthogonal . (24) 











BERT F. GREEN, JR. 157 


From (21) and (24), 


Ba Day'=C Ag} (25) 
B Ap Day} ihe. = C . (26) 
Let 
T = Ap Day! Ac. (27) 
From (26) and (27), we may solve for T in 
BT=C. (28) 


The least-squares solution for 7, found by minimizing the trace of 
(BT — C)'(BT — C), is 


T= (B'B)>B'C. (29) 


This method of obtaining 7 was pointed out by Dr. Paul Horst. 
From (27) 
Lae = Apo Da) Ay. (30) 


Since D,,) is diagonal, and since its diagonal elements are all differ- 
ent and non-zero, it is unique except for order, and contains the char- 
acteristic roots of JT T’. A, is also unique except for order, which is 
unimportant in this model. Thus a complete principal component 
analysis of T 7" will yield A,, which, when premultiplied by B, will 
yield A. 

The method of solution is to factor P, and P,) , obtaining B and 
C respectively. From B and C obtain T by equation (29). Perform 
a complete principal component analysis of T 7’ to obtain A,. Ob- 
tain the product B A, which is A, the matrix containing the latent 
structure parameters. 

After the foregoing was written, a simplification of the solution 
was pointed out by Dr. T. W. Anderson. From equation (29), 


T T’= (B’ B)+ B'CC’B (BB). (30a) 
We may substitute in (30a) from equation (22) yielding 
TT’ = (B’ B)“*B’P)B(B'B)-. (30b) 


This implies that P,,, need not actually be factored. This solution 
involves factoring P, to obtain B, and obtaining T7’ by the matrix 
multiplication indicated in (30b). A, is obtained in the same man- 
ner in both methods of solution. 

An advantage of the factor analysis approach to the problem is 
that the number of classes can be determined in the factoring pro- 








158 PSYCHOMETRIKA 


cedure. In most other approaches to the problem, the number of 
classes to be used must be estimated before starting the computa- 
tions. 


Since P, and P,,, are factored in obtaining the solution, the de- 
rived latent parameters may be expected to fit these matrices. How- 
ever, the solution makes no use of the higher-order joint occurrence 
frequencies 9;;x,, etc. Thus, in order to be sure that the derived 
parameters fit the complete set of data, all joint occurrence matrices 
would have to be checked against their computed counterparts. In 
practice, a check of some joint occurrence frequencies picked at ran- 
dom would probably suffice to determine whether the latent class 
model could be used to summarize the data. For fallible data or esti- 
mated unknowns, the characteristic roots of TT’ will not be exactly 
the same as the sums of columns of L. The fit of the derived latent 
parameters is determined in part by this discrepancy as well as by 
the discrepancies of actual and computed joint occurrence matrices. 


It should be noted that any higher order Pc could be used in 
place of P,,) in the solution, if its corresponding Dz satisfied the con- 
ditions imposed on D,,,. It was felt that while many D; might not 
satisfy these conditions, D,,, might ordinarily be expected to. Fur- 
thermore, since P,,) is symmetric in the items, it yields a unique so- 
lution. 


This solution is not ideal. The development has been almost en- 
tirely in terms of algebra, with no statistical estimation procedures 
implied, nor any sampling theory. It would be desirable to find a so- 
lution which would be assured of fitting all the joint occurrence mat- 
rices in some “best” statistical sense, and which did not require esti- 
mating the elements with recurring subscripts. 


Estimation of Elements with Recurring Subscripts 

The elements with recurring subscripts in latent structure 
analysis are akin to the communalities in factor analysis; the esti- 
mation problems are similar in some respects. However, the neces- 
sity of estimating pii; and pii; poses additional complications. From 
the definition of P,,) , it can be seen that there are many terms in- 
volved in it which must be estimated. While there are r(r+1)? terms 
represented in P,,.) , since each element represents the sum of r terms, 
some of these terms are duplicates. There are actually (r*+5r) /6 
different terms observable, and r?+r different terms to be estimated. 
For small numbers of items, there are about as many terms to be 








BERT F. GREEN, JR. 159 


estimated as to be observed; in fact for r < 7, there are actually 
more terms to be estimated. It is clear then that we must either have 
good estimation procedures or restrict ourselves to fairly large num- 
ber of items. 


From equations (1) to (4) it is clear that limiting values for 
Pii are 
Di = Dis = Yi”. (31) 


It can be shown that if the latent class probabilities for items 7 and 
j, Vig and v;,, are proportional, then the ith and jth columns of P, 
are proportional. If we take the first row of P, as its first factor, 
using Thurstone’s diagonal method, then the same type of propor- 
tionality applies to the first residual matrix, [pis _ DiP|| ; & 
(Vis — Vit) is proportional to (v;, — v;:) for all s and ¢, then the 
columns in the first residual matrix for items 7 and 7 are proportional. 
It is suggested that P, and the first residual matrix be inspected for 
proportional columns, and diagonal entries inserted to maintain the 
proportionalities. Otherwise it might be reasonable to take the high- 
est (pi; — pip;) as an estimate of (pii — p;?). 


After P, has been factored, using an estimate of pii, a new 
estimate of pi; may be computed from the factor matrix. If this is 
quite different from the original estimate, and if the number of items 
is small, it may be best to refactor P, using the new estimate. How- 
ever, for larger numbers of items, the discrepancies probably will not 
markedly affect the factor matrix. 


Having the estimate of p:; from the factor matrix of P,, we 
may proceed to the estimation of p;;; and pii;. From equations (1) 
to (4) it is clear that 


Diy 2 Dis; - (32) 
Since all principal minors of P; must be non-negative, we have, tak- 
ing the oth and ith rows and columns, 


D; Dij 
>0. (33) 








Dis Diss 
From (32) and (33) we have, as limits for 9i;; , 


| 
Dis 2 Diz 2 ey (34) 
7 











160 PSYCHOMETRIKA 
By thinking of ;;; as a probability, we may write 
Diij = Diss Dj , (35) 


where ii); is the conditional probability of ii given 7. One plausible 
assumption is that 








Pig = Di 
=—, (36) 
Pi Di 
From this we obtain for an estimate of pi; , 
Pit Dij 
Dis = ‘ (37) 
Di 


It may be verified that 


2(Diis Di — Dii Dis) = 2 Me Me Vis Vie (Vie — Viz) (Vis — Vit). (38) 


Dii Di; 





Then ii; > if Vis : Vit according as Vj, e V3, Which would 


Di 


seem to be the case in which i and j are positively related. If 7 and j 
Dis Dis 


Di 
gested, in intuitive fashion only, that the estimate of ;;; be taken as 





are negatively related, probably pi;; < . From this it is sug- 


Dit Dij 





Pi = (1 + pis — Dis) . (39) 


Similarly, for ;;; we have 


Dis? 











Dit > Diss S (40) 
Di 
If 

Dis = Dis Dis? 
=—— -, then pi; —=—>;3 (41) 

Dii Di Di 

or 
Dis? 

Dit = (1 + pis — D?). (42) 





Di 




















BERT F. GREEN, JR. 161 


In estimating pii;; and pii; by equations (87) or (39), and (41) or 
(42), if the estimates fall outside the range of possible values, as 
given in (34) and (40), the nearer limit should be taken as the esti- 
mate. 


When the elements with recurring subscripts have been esti- 
mated, the method of solution can be applied to the data to obtain 
the latent parameters. From these obtained parameters new esti- 
mates of the unknown elements may be computed, using the equa- 
tions implied in (10), i.e., 


Di = po Ng Vis’; (48) 
8=1 
m 
Diig =D Me Vis® Vis; (44) 
8=1 
m 
Diii = Dd Me Via’. (45) 
é=1 


If these values are somewhat different from the original estimates, 
the entire solution may be recomputed using the new estimates. From 
these calculations, a second set of derived latent parameters are avail- 
able, from which a third set of estimates of the unknowns may be 
calculated. Using these, a third set of latent parameters may be de- 
rived. This iterative procedure should be continued until the esti- 
mates of the unknowns found from the last set of derived parameters 
are about the same as the estimates computed from the previous set 
of parameters. 


It may be remarked that in theory it is possible to actually com- 
pute the unknown elements if the ranks of P, and P; are known. One 
would proceed by using the properties of “links” and “basic” deter- 
minants as discussed by Lazarsfeld.* Equations involving asymmet- 
tric determinants with one unknown element could be set up. How- 
ever, it seems likely that such a procedure would be quite cumber- 
some, 

These methods for estimating the elements with recurring sub- 
scripts are offered as suggestions rather than as final statements. 
With more experience in using the model, and with the gradual accu- 
mulation of empirical results, it is hoped that investigators in the field 
will devise better practical estimation procedures. 


*Lazarsfeld, Paul, op. cit. 





162 PSYCHOMETRIKA 


Illustrative Example 
In order to illustrate the method of solution, a fictitious 3-class, 
8-item example has been prepared. The hypothetical latent structure 
is presented in N, and L,’, Table 1. (For convenience N, is written 





TABLE 1 

8 

Ny S. = 
ist &€ #4 4. 2 0.9 
I ra I fs ¢: «2 2 F 2 212 
II 8s i110 56 7 © 7 2% 0 2 | 81 
IIT sim -s» 2 8 @ 2 4 &4 31 

















as a column vector.) In Tables 2 and 3 are presented the joint oc- 
currence matrices P, and P,,) , respectively. These matrices contain 


TABLE 2 
Py 


0 1 2 3 & 5 6 sf 8 





1.000 .620 .730 .450 .350 .610 .250 .280  .580 
620 .482 .477 .825 .199 .465 .225 .212 .366 
730.477 «.589 = 6840) 251 467 )=S— 2000S 2114S 4285 
450 .825 .840 .295 .090 .280 .175 .150 8 .205 
350 .199 .251 .090 175 .227 .050 .086 .251 
610 .465 .467 .280 .227 467 .200 .202 .389 
250 .225 .200 .175 .050 .200 .125 .100 = 8 .125 
280 .212 .214 .150 .086 .202 .100 .094 # .160 
580 .866 .425 .205 .251 .889 .125 .160 .886 


ont oak © NS - ©O 











the true values of the pi; , pii; , and piii. However, since these values 
are unknown in any practical situation, estimated values have been 
substituted for these true values throughout the calculations. 


The first factor of P, was obtained by Thurstone’s diagonal meth- 
od, pivoting on the first row of P,. In the residual matrix, whose 
elements are (pi; — pip;), the highest off-diagonal entry in a column 








BERT F. GREEN, JR. 


163 





TABLE 3 
Pay 
0 1 2 8 4, 5 6 7 8 
0 3.870 2.751 2.913 1.860 1.3829 2.697 1.200 1.218 2.307 
1 2.751 2.225 2.188 1.530 .828 2.117 1.080 .979 1.590 
2 2.918 2.188 2.210 1.452 .971 2.080 .960 .945 1.724 
3 1.860 1.5380 1.452 1.266 .3872 1.344 .840 .690 .894 
4 1329 .828 971 872 .654 .928 .240 .3855 .961 
5 2.697 2.117 2.080 1.344 .928 2.080 .960  .923 1.659 
6 1.200 1.080 .960 .840 .240 .960 .600 .480 .600 
7 1.218 .979 .945 .690 .855 .9238 .480 .432 «691 
8 2.307 1.590 1.724 .894 .961 1.659 .600 .691 1.532 











was taken as the estimate of (pi; — pi”), except for the first two col- 
umns where the property of proportionality was used. After two 
more factors were extracted by the centroid method the residuals 
were vanishingly small; (the frequency distribution of residuals had 
mean = 0, standard deviation = .003). From the factor loadings, 
new estimates of the diagonal elements were obtained, which in turn 
were used to recompute the factor loadings. The factor matrix B,’ is 
presented in Table 4. Note that we did not have to start with knowl- 
edge of the number of latent classes; the factor analysis procedure 
indicated clearly that three classes were sufficient. 





TABLE 4 
B,' 
0 1 2 8 4 5 6 7 8 
I 1.000 .620 .730 .450 350 .610 .250 .280 .580 
II 0 268 067 .255 -169 .164 .252 .120 -.100 
III 0 157 .041 -187 .173 .249 .022 .048 .208 











The pi; computed from B, were used to estimate the pi; and 
piii by means of equations (84) and (37). Using these estimates in 
P) , we obtained the factor matrix C,' by the diagonal and centroid 
methods, Table 5. Following the outlined procedure, T,, Ay, , 6, Ni, 





164 PSYCHOMETRIKA 


TABLE 5 
C,’ 


0 1 2 3 4 5 6 7 8 





I | 1.967 1.897 1481 .941 .679 1868 611 .620 1.175 
II 0 466 .122 .5380 -.858 .266 .504 .199 -.287 


III 0 265 .052 -.261 .261 .3881 .082 .064 316 





and L,’ were obtained. Tables 6, 7, and 8 present these. (f, is a 
vector containing the characteristic roots of TT,’.) 


TABLE 6 

qT; 
1.9648  -.0016 -.0185 
4390 1.9306 -.0049 


3862 —.2280 1.6207 


TABLE 7 
Ay, 
-788 -480 .526 
-656 -.651 —.882 
178 -625 -.760 


A, 
4.806 3.620 2.188 


TABLE 8 
L,’ 
N, 8 


= 
0 1 2 8 4 5 6 7 8 $=1 





537 | 1.000 .898 .799 .645 .240 .816 .480 .400 .542 | 4.820 
185 | 1.000 .442 .688 -135 .858 .723 -.100 .167 1.085 | 3.678 
277 | 1.000 .198 .622 .462 .222 .181 .086 .124 .852 | 2.147 





From N, and L, , new estimates of the unknowns were available. 
Since these estimates were considerably different from the original 
estimates, both sets of factors were recomputed, and Nz and L, ob- 
tained. At this point the estimates of the unknown elements indicated 
that the factors of P, could not be improved, while the factor load- 











BERT F. GREEN, JR. 165 


ings from P,) might be improved. Thus B,’ = B,', Table 9, while P, 
was refactored to obtain C,;’, Table 10. From these factor matrices, 
T;, Avs, Bs, Ng and Ls’ were computed and are presented in Tables 


11, 12, and 13. 








TABLE 9 
B,’ = B,' 
0 1 2 3 es 5 6 7 8 
I 1.000 620 -730 450 350 610 250 .280 -580 
II 0 269 068 258 -.164 170 250 117 = -.098 
III 0 157 042 —.148 168 254 018 044 .206 
TABLE 10 
C,’ 
0 1 2 3 4 5 6 7 8 





I 1.967 1898 1.481 942 677 = 1,372 -610 619 1.174 





Il 0 467 121 «558-879 24BCATT:SC(‘«‘«é SS (OtC 2G 
lll 0 2388 068 -217 257  .389 028 .070  .813 
TABLE 11 
T; 

1.9650 .0006 -.0002 
4450 —-1,9485 0021 
8820 8476 1.5256 
TABLE 12 
Abs 
106 526 ATA 
704 -595 —888 
079 608 790 
Bs 
4817 38.737 1.896 
TABLE 18 
N, L', 8 

0 1 2 3 4 5 6 1 gist 





498 || 1.000 .905 .803 .691 .205 .807 .501 .402 .506| 4.820 
277 || 1.000 .498 .702 -.008 .730 .711 -.011 .198 .930/ 3.750 
225 || 1.000 .187 603 477 .205 .049 .015 112 816] 1.914 

















166 PSYCHOMETRIKA 


A comparison of L,’ and L,’ incicates a close agreement. The 
necessity of the iterations in this example is probably due to the small 
number of items. The more items, the smaller the relative error in 
P, and P,,, due to the estimation of unknowns, so the closer B, and 
C, would be to their true values. 

It should be noted that in some cases the roots of TT’ may be 
very similar, causing computational difficulties. It will usually be 
possible to spread out the values of these roots by reversing the 
positive and negative response designations for some items. Since 
the roots correspond to sums of columns of L, an inspection of the 
values in the first trial L will indicate which items should be changed. 
(If the scoring of an item is reversed, P.), must be recalculated, 
but for P,, if the first row has been taken as the first factor, the 
elements in the corresponding row and column of the residual matrix, 
elements (pi; — pip;), are merely reversed in sign. Hence the cen- 
troid factor loadings for that item are merely reversed in sign, while 
the first factor loading is changed from p; to 1 — 7; .) 


Summary 

The general formulation of latent structure analysis is presented, 
following Lazarsfeld, for the case in which the underlying attitude 
variable is assumed to have a point distribution. For this case, called 
the latent class model, the general matrix equation, (10), relating 
the manifest data to the latent parameters is stated. Also stated is 
the general matrix equation, (15), which relates the basic item re- 
sponse data to the joint occurrence matrices. Under the restrictions 
that m < r + 1 (where m is the number of latent classes and r is the 
number of items), that A, (9), is non-singular, that D,,,, (11), has 
diagonal entries all different and non-zero, and that the elements 
with recurring subscripts are known, a solution of equation (10) is 
presented: equations (20), (22), (29), (30), and (21). Some meth- 
ods of obtaining estimates for the joint occurrences with recurring 
subscripts, such as pi; , pii; and pii;, are discussed. Finally a fictiti- 
ous 8-item example is presented in detail. 

It is hoped that further work at a theoretical level will disclose 
better solutions than the one presented here, and that a solution will 
be found for the potentially more powerful continuous distribution 
model. 


Manuscript received 7/26/50. 
Revised manuscript received 10/5/50. 














PSYCHOMETRIKA—VOL. 16, NO. 2 
JUNE, 1951 


TIME-LIMIT TESTS: ESTIMATING THEIR RELIABILITY 
AND DEGREE OF SPEEDING 


LEE J. CRONBACH AND W. G. WARRINGTON 
UNIVERSITY OF ILLINOIS 


Non-spurious methods are needed for estimating the coefficient 
of equivalence for speeded tests from single-trial data. Spurious- 
ness in a split-half estimate depends on three conditions; the split- 
half method may be used if any of these is demonstrated to be ab- 
sent. A lower-bounds formula, r,, is developed. An empirical trial of 


this coefficient and other bounds proposed by Gulliksen demonstrates 
that, for moderately speeded tests, the coefficient of equivalence can 
be determined approximately from single-trial data. It is proposed 
that the degree to which tests are speeded be investigated explicitly, 
and an index 7 is advanced to define this concept. 


Introduction 

Most group tests of mental ability and achievement are admin- 
istered with time limits. In some cases, the time limits are of no im- 
portance, as nearly every subject completes all he can do correctly. 
In other tests, the limits are short enough to make rate of work an 
important factor in the score. Careful distinction between these types 
of time-limit test, has been lacking, and all time-limit tests have gen- 
erally been referred to as speed tests. 

It is a maxim that the “reliability” of speed tests should not be 
determined by single-trial procedures such as the split-half and Ku- 
der-Richardson formulas. This prohibition is presented in an unquali- 
fied fashion in many texts, an example being this statement (6, p. 
69): 

In a speeded test, it is impossible to determine a coefficient of equiv- 
alence save by giving an immediate parallel test. The split-half and 
Kuder-Richardson methods must not be used. When this principle is 


disregarded, as it often is, and split-half methods are applied to speed 
tests, the resulting coefficients are grossly inflated. 


Comparable statements are made by Thorndike (16, pp. 86, 112), 
Adkins (1, p. 152), and Guilford (9, pp. 486, 496), who do, however, 
indicate incidentally that some time-limit tests are so little speeded 
that the internal-consistency methods may be used. 


167 








168 PSYCHOMETRIKA 


Although such prohibitions have been voiced repeatedly, they are 
as commonly violated. Many test authors, even in recent manuals, 
report reliability estimates based on procedures known to give spuri- 
ously high results. Some introduce additional misinterpretation, thus: 
“The reliabilities ... were computed by means of the Kuder-Richard- 
son formula, which may underestimate the true reliability but should 
never over-estimate it.” (12). So well-informed a pair of investiga- 
tors as the Thurstones, in a major factorial study (15), base all their 
reliability estimates on K-R Formula 20 (we shall call this coefficient 
a in the rest of this paper), although many of the tests are speeded. 

The prevalence of spurious coefficients and their interference 
with test evaluation is suggested by the repeated criticism by Buros’ 
reviewers on this point (3, pp. 347, 410, 459, 518, 532, 630-631, 636). 
While the tone of the review varies, each one echoes, in effect, the 
words of E. K. Taylor (3, p. 631): “Since the use of the Kuder-Rich- 
ardson formulae on speed tests is entirely inappropriate, the reli- 
ability of the instrument remains an unknown quantity.” 

Essentially we have a situation where the test user is given re- 
liability coefficients which—according to accepted principles—are no 
good at all, and which if anything do harm by making the tests look 
more dependable than they are. Why does such a situation arise? 
Partly we can blame authors and publishers for lack of scruple. 
There is temptation to report spurious coefficients, since such data 
make tests more salable. But we must also recognize that almost pro- 
hibitive labor is involved in getting a non-spurious estimate of re- 
liability by usual methods. The Thurstones, for example, would have 
had to double the amount of testing in their factorial study, to get 
better coefficients. 

We cannot ignore the probability that disregard of the com- 
mandment against spurious procedures stems (in part) from the 
sweeping nature of the pronouncement. The injunction is frustrat- 
ing to authors and publishers because it blocks an easy technique. 
Then consumers are educated to demand coefficients, and no reason- 
ably economical method for supplying them is sanctioned. Worst of 
all, there is considerable evidence that the pronouncement is unrea- 
sonable and unjustified in many instances. Where a test is only slight- 
ly speeded, and so most people finish all they can do, split-half or a 
coefficients are not misleading. In many instances where both single- 
trial and two-trial coefficients have been determined, the single-trial 
coefficients are not noticeably higher. 


Whatever the causes, we lack interpretable reliability coefficients 








LEE J. CRONBACH AND W. G. WARRINGTON 169 


for most time-limit tests..To relieve the situation, we can (a) estab- 
lish limits within which it is safe to rely on conventional single-trial 
analyses for speeded tests or (b) develop usable single-trial proce- 
dures which do not give spurious results. Gulliksen (10) has taken 
steps in this direction, and offers several suggestions. We shall, in 
this paper, demonstrate mathematically the nature and causes of 
spuriousness in single-trial estimates and derive a lower-bounds for- 
mula to correct for spuriousness. Then we shall compare results of 
several procedures, including Gulliksen’s, applied to a set of tests 
which vary as to speeding. We shall develop formally the concept of 
degree of speeding. Finally, we arrive at working recommendations 
for dealing with speeded tests. 


A Rationale 


The reliability coefficient is supposed to indicate the stability 
of a test score over a period of time, or the equivalence of two forms 
of a test (7). The stability of a performance from day to day or 
month to month cannot possibly be estimated from single-trial data. 
Ordinarily we are concerned with obtaining a coefficient of equiva- 
lence, a measure of the consistency of the person’s standing on two 
measurements at the same time. If only one form of a test is avail- 
able, the split-half method or Kuder-Richardson formula a is used 
to indicate the consistency of measurement. 

But to determine the equivalence of two measures, we need two 
or more sets of independent data. The parallel-form method, for ex- 
ample, requires that the two tests be administered independently. 
When we split a test in order to get two sets of scores, we are willing 
to assume that the many items within an unspeeded test are inde- 
pendent of one another. That is, especial difficulty on one of the items 
neither increases nor decreases the person’s probable standing on the 
remainder. But in a timed test, the person who gets stuck on one 
item may never reach the remainder of the items. It is this interde- 
pendence of items that introduces spuriousness. We turn now to de- 
fining the conditions for spuriousness mathematically. 

Suppose a test containing n items (i= 1, 2,---, ) is taken by 
N persons (p= 1, 2,---, N). Then, on any trial, each person com- 
pletes f, items in the time allowed. If person p attempts item 7, he 
earns a score Xj, on the item. His test score is x,. 

tp 


Lp => Tip. (1) 
é=1 











170 PSYCHOMETRIKA 


In order to account for the possibility that some persons finish well 
before time is called, we envision “invisible” items at the end of the 
test, all of which are so difficult that everyone’s score on them is in- 
evitably zero. This device permits f, to be greater than n, and per- 
mits f, to be greater for a person who finishes the test early than for 
one who finishes late. 

When many independent trials on the same items are given, we 
estimate increasingly accurately the person’s true number of items 
finished f,,. and the true score on any item (when attempted) Zip... 


Sun ee Fo and XL ipo — Lip ° 
On some particular trial, 
fro= Sow +, 
and if item 7 is finished by person p (i.e., 7 < fp), 
Lip — Lipo - Eip. 
y and « are error terms and may be positive or negative. 
f paotVp f px f potVp T peo TpotVp 
ve = (Lise = Eip) — > LX ipeo + > Lipo + ey Eip Ss by Eip- (2) 
t=1 1 fp 1 Spe 
Now the desired coefficient of equivalence is the correlation be- 
tween scores on two sets of items, administered independently. We 
shall indicate parameters of the second test by primes (thus, 2’, 


f',, etc). To make the tests comparable, we assume Lipo = 2'ipo, 
M,= M.z: » Sz — Ge’, Sno = fon . 


T pw pot’ p f pe fpat¥’p 
Lp = >> Vipo T p> XL ipeo i = E ip tT > E ips (3) 
1 T poo 1 T peo 


1* 
—> 2,7',—M,M,: 
el 
Tre’ — > 
TzTz’ 





1 
NV > Xx’, — M,? 
Tee’ = , (4) 


os" 





To examine the effect of violation of the assumption of inde- 
pendence upon the correlation between tests, we expand the term 
zz’. To simplify notation, we drop the subscript p hereafter. 
Multiplying (2) by (3) and summing, 














LEE J. CRONBACH AND W. G. WARRINGTON 171 
N 
Peg 
1 


faotv’ lx far’ 
=3[ Zia)* + S Vico ~ tin + 2 ia Se tr S View ~ &j 
i=1 


p 


(a) (b) (c) (d) 
faotV fw faotV faotv' faotv for+V Taotv’ 
+S tie 3 tint F tin 5 tet Stn it Btn Boi 
feo 
(e) (f) (g) (h) 
leo Taotv’ feo leo fo Taorv’ 
+Sadeut Fad tet saset hase 
1 co 
(i) (j) (k) (1) 
TotV faotv faotv’ Taot¥ lee Taotv Tot’ 
+ Sadat Pas tot ve Beit Paser| (5) 
1 few 


(m) “(n) “(o) (p) 


N N 
We assume that 5 », = 0 and > e«;, = 0 for any trial. (Throughout 
pal pal 
this analysis, we are dealing with population statistics, and N is ac- 
cordingly large). We also assume that », and ¢;, are independent of 
each other and of f.. and Lipa - 

Whether the two tests are experimentally independent or not, 
terms (c), (g), (h), (i), (j), (1), (mn), and (0) reduce to zero as a 
consequence of the definition of error as independent of true score, 
and of the relation > ¢;, = 0. We also assume that re e, = 0, hence 

Dp 4 


(k) = 0. This is the assumption involved in the usual split-half 
method applied to unspeeded tests, namely, that departures from 
true score on one item are independent of departures on other items, 
when both items are reached by the subject. 


DSxez’=DS (atdo+d+e+ft+m+p). (6) 


If x and x' are independent, ry. = 0 and (f) = 0. Then 
D L2' snacpencent — > (@t+o+d+e+m+p). 
And, since 5(b) = S(e), and S(d) = S(m), 
> XH’ independent — (a + 26+ 2d + p). (7) 


If x and x’ are not independent, being obtained from items ad- 
ministered within a single time limit, 7. # 0. Then 


> 28 eevee = Fe +b + d+e +f+m +). 








172 PSYCHOMETRIKA 


Collecting symmetrical terms, 
pa DE spurious — b> (a be 2b ot 2d 7 f + p). (8) 


Subtracting (7) from (8), 


D == Z SE sgertein a zZ LS’ independent 


(9) 


feotV faotv’ 


= >(f) == 3 Lic > Liw » 
Pp P feo feo 


D indicates the spuriousness of a single-trial estimate. The dif- 
ference between r, the spurious single-trial estimate of equivalence, 
and 7; the comparable-form coefficient, is a monotonic function of 
D. The right-hand member of (9) is a covariance which depends 
on the magnitude of », the correlation between » and »’, and the size 
of the xi,. for items such that f.. — |y| <i<f. + |»|. D approaches 
zero if 


(a) » approaches zero, i.e., there is little variation from trial 
to trial in number of items finished by a person; 


or 
(b) 7? approaches zero, ie., fluctuations from true number of 
items finished, on one test are independent of fluctuations 
on the other test; 
or 


(c) the xi,. for items the person is working on as time runs 
out approach zero. 


As all persons come close to completing the test, variation in » ap- 
proaches zero, and in this case D approaches zero. 


Conditions required for single-trial estimates to be acceptable 
without correction. If any one of the following conditions is satis- 
fied, spuriousness is negligible and the single-trial methods (split- 
half or a) may be used. 


(1) The variation in number of items finished is small. 


(2) When time is called, subjects have completed all the items 
which they have an appreciable chance of getting right. (Or, if a 
correction formula is used in scoring the test, subjects have com- 
pleted all items where their probability of success is greater than 
chance.) 








LEE J. CRONBACH AND W. G. WARRINGTON 173 


(3). Fluctuations in number finished from trial to trial are 
small, compared to variation in number finished from person to per- 
son. 


Single-trial data do not permit us to estimate fluctuations in 
number finished. But o, < o;, and we may sometimes demonstrate 
that condition (1) holds by showing that o; approaches zero. To 
demonstrate condition (2), we may compute the success of each per- 
son on the last one or two items attempted; if this approaches zero, 
condition (2) is satisfied. These methods provide a basis for demon- 
strating, under certain circumstances, that a single-trial estimate of 
reliability is a good estimate of the value to be obtained from inde- 
pendent forms. On the other hand, we cannot conclude that the esti- 
mate is necessarily spurious in cases where neither condition (1) 
nor (2) obviously obtains. The above approach leaves something to 
be desired, for it is expressed in the indefinite words “small,” “ap- 
preciable,” and the like. 


Estimating Bounds to the True Coefficient 
If we can obtain an upper bound for D, we can correct the spu- 
rious coefficient of equivalence (r,) to get a lower bound for the true 
coefficient from independent trials (7;). 





From (4), 
“4 (10) 
4 Te ’ 
i N o?, 
feot¥ foatv’ 
D=> in Bens (9) 
? fe le 
We now assume that 
Taot¥ 
Zi tin S ates (11) 


where 2%. , is the person’s true score on the f,.th item. This assump- 
tion becomes true when items are arranged in order of difficulty, 
and the true order of difficulty of items is the same for each person. 
Should the latter condition not be true, departures from (11) will 
nevertheless average out, over the group of subjects. 


N 
D<3D> wep’. (12) 


Now there are N, cases for whom f, < , and for them, the term 
within the summation of (12) is not zero. For the remaining cases 








174 PSYCHOMETRIKA 


(Par. «+ Py) Whose last item is one of the “invisible” items beyond 
the nth, x.;, is zero and the term within the summation is zero. 


Therefore, 
Ne 
D <3 Pur, %’- (138) 


Now, assume ’ = »”, i.e., 7, = 1.00. This overestimates the degree 
of spuriousness. Since y is independent of %q;, , 


N'a Na 
D <73 Lo D9. (14) 
al 1 


If we let ov.) represent the standard deviation of » for the first a 
cases, 


Na 
D < ova) > Lot, ) (15) 
1 
No 
oy(a) D Xf 
2 
Ti > Ts ~~ —_ (16) 


In order to estimate this, we make use of the relation o, < a;, 
Na 
O(a) >> L cof x 
r, > Tt; — (17) 
o’?, N 

Since N, is defined in terms of true number finished, single-trial data 
permit us only to make approximations to the above correction. We 
may assume, however, that the number of cases whose true number 
finished is n or fewer is approximately the same as the number whose 
obtained number finished is m or fewer (where, as before, we define 
number finished in terms of the “invisible” items). Our data do not 
show clearly whether the person who reached the end of the test did 
so with time to spare, so that his last item is one of the invisible ones, 
or whether he barely finished. We propose to estimate the number 
of cases who finished the nth item, but did not finish the (n+1)th in- 
visible item, by assuming that this number equals the number who 
completed the (n—1)th item but not the mth. (A more elaborate 
method, involving extrapolation from the several preceding items, 
seems unnecessary.) We therefore determine o*;,,, from the fre- 
quency distribution of number of items finished, where the frequency 
at n items is set equal to the frequency at n—1 items. 


Na 
Estimating > 2?.,, is slightly more difficult. Neither f,. nor 








LEE J. CRONBACH AND W. G. WARRINGTON 175 


Lipo Is known. We therefore make our estimate in terms of xa), the 
person’s mean obtained score on the last items finished. Because the 
mean true score, over many persons, equals the mean obtained score, 
and because the variance of obtained scores is greater than the vari- 
ance of true scores, 

Ne Na 

D Bcf. < DY X7rp. (18) 
Xr» might be estimated from only the last item, but this would add 
considerable error variance to the estimate. As more items are aver- 
aged to get x,,, the estimate becomes closer to the true score, making 
the members of (18) less unequal. But as still more items are added, 
so that the easier items nearer the start of the test are included, the 
estimated a,, again becomes larger and the inequality more extreme. 
We therefore have estimated xa, by averaging each person’s score on 
the last two items reached. Because we wish to sum over N, cases, 
the value of x?,, is obtained for each person who reaches the end of 
the test, and averaged. This value is then multiplied by the number 
of persons reaching only n—1 items, and entered in the total together 
with the values of x*,, for persons completing n—1 or fewer items. 


Ne 
977 (a) >) X*Ap 
2 


T, > e= 1, — —————__.. (19) 
te N o?, 

The desired coefficient of equivalence, 7; lies between r, and 7. 
Because a great many inequalities are introduced in deriving (19), 
the lower bound will often be far from 7; . There seems to be no pur- 
pose in obtaining a lower bound which is far below the upper bound, 
so that the coefficient is essentially undetermined. But in instances 
where D , as estimated, is small, the upper and lower bounds will be 
close together and 7; can be inferred satisfactorily from single-trial 
data. It is therefore recommended that equation (19) be used in 
evaluating single-trial data for time-limit tests. Our derivation de- 
pends upon a large number of cases, and large samples should be 
used in practical work with the formula. If the test is essentially 
unspeeded, 7, and 7, will be close together and a confident report re- 
garding the coefficient of equivalence can be made. If the bounds are 
widely separated, no useful conclusion is possible. 

The computational procedures leading to vr, are not involved. 
The steps are as follows: 


1. Determine number of items finished (f,) for each person. 











‘ 


176 PSYCHOMETRIKA 


2. Make a frequency distribution for the number of persons 
having each number of items finished, entering the value for n—1 
items opposite n items also. From this, compute o7;,a) . 

3. For the next step, errors should be marked on the answer 
sheet with a colored pencil. For each person, average his score on 
the last two items and square the value obtained. (This square will 
be .00, .25, or 1.00.) If N. persons complete the test, and N, per- 
sons complete all but the last item, sum the x*,, for the No persons 
and multiply by N,/N,. To this value, add the x*,, for all persons 
not finishing the test. 

4. Enter these values in (19), together with o,? obtained in 
the usual fashion, to get r.. 


Gulliksen’s Formulas and Modifications of Them 
Gulliksen (10) has recently derived three lower-bounds formu- 
las intended to serve as single-trial estimates of reliability for speed- 


ed tests. These formulas are: 
2 











r= —— (20) 
Gg 
n—M, 
r= ———3 (21) 
eg 
re n(n—M;) — i Ain. Ml (22) 
(n—1) o,? 


Gulliksen states that rx > ry > 7s, but since variance in unfinished 
may be small compared to mean unfinished, this is a relation which 
tends to be true rather than a mathematical necessity. In these for- 
mulas, f refers to number of items finished, with no reference to the 
hypothetical “invisible” items included within f in our rationale. 
Gulliksen recognizes that a better lower bound might be ob- 
tained if ” , the total number of items, were replaced by an estimate 
of the number of items actively differentiating between persons. 
Items which everyone reaches, and items which no one reaches, play 
no part in the variance of f. We have developed this suggestion into 
two additional formulas, each of which would be expected to give a 
better lower bound than r;. If k is the greatest number of items 
completed by any subject, and if m is the smallest number completed 
by any subject, 
k ome — S aim 2 
anasiadiacel (k—M,) — (k—M;)? — o7 (28) 
(k— 1) oz" 

















LEE J. CRONBACH AND W. G. WARRINGTON 177 


and 
— a —M 2— g,? 
ae (k—m) (k—m—M,,) — (k—m—M;) il (24) 
(k—m—1) o,? 





Tr Arn ? Ve. 
From (21), we have the similarly derived formula, 
k—M,;, 
Tu =. — ’ (25) 


G, a 





Ty 2 Ty. 


Since the precise value of k , and of m, depends on only a single case, 
formulas (23), (24), and (25) are subject to marked fluctuation 
from sample to sample. As in the case of 7., 7x and 7x. should be 
based on large samples. 

Gulliksen’s initial formulation assumes that error variance on 
the test (E,*, in our notation) may be broken into two independent 
portions, Hy? based on items marked incorrectly and E,* (our ») 
based on items left unfinished. His statement, 


E/Z=EY +, (26) 


disregards the probable negative correlation between Ey and ». In 
correspondence, he points out that such a correlation probably has 
little effect on the reliability estimate. In any case the more correct 
statement, 

e<EyYt+*7, (27) 


is not inconsistent with Gulliksen’s formulas. The second point to be 
mentioned is that, in deriving ry, Gulliksen employs the relation 
v? < My (U being the number of items unfinished, n—/f). In deriving 
this he implicitly assumes that the correlation between reaching 
items g and h within the same trial (7, ;h,) is negligibly greater, over 
all item-pairs, than the correlation when the items are independently 
administered (7,,,). Gulliksen’s derivation of rx is based on a simi- 
lar assumption. Correspondence with Gulliksen clarifies the point 
that he justifies such an assumption on the empirical grounds that 
any spuriousness of the type 1,,1, > 7,n, has a negligible effect on 
the reliability coefficient. This seems probable at first glance, and 
our empirical data tend to support this view. The Gulliksen formu- 
las would rest on a sounder base if the conditions required for this 
assumption to hold were examined. 











PSYCHOMETRIKA 


Empirical Analysis 

We have made an empirical analysis of a set of test data in or- 
der to answer the question: Just how inaccurate are single-trial esti- 
mates of reliability for speeded tests? Much of this work was done 
before the foregoing rationale was developed, but in any case empiri- 
cal evidence is needed to demonstrate forcefully to test authors what 
risk they run of misleading readers when they publish such coeffi- 
cients. The analysis also permits us to try the newly proposed lower- 
bounds formulas, to evaluate their usefulness. 

The data treated were made available by Dr. Merle Tate. He 
administered four mental tests individually, noting the time required 
by each person to complete each item (14). The 36 high-school stu- 
dents used as subjects were directed to work for both speed and ac- 
curacy. A maximum time of three minutes was allowed for any item, 
but this limit was rarely reached. He had four tests, ranging from 
60 to 64 items, dealing with arithmetic reasoning, number series, 
sentence completion (vocabulary) and spatial relations. His items 
were drawn from typical mental tests, but his experimental proce- 
dure may make the results somewhat different from results obtain- 
able in ordinary group testing. 

We obtain a test score for each student for any given time limit. 
By determining the cumulative time on successive items, we deter- 
mine how many items he would have finished by 1200 seconds, for ex- 
ample; then we count how many of those he solved correctly. 

The “true” coefficient of equivalence 7; was obtained by divid- 
ing the test into comparable halves, and adding the times for each 
student as if the two halves had been administered independently. 
Guttman’s split-half formula (11) was used to obtain the reliability. 
The spurious split-half reliability 7, was obtained when the two half- 
tests were timed within a single time-limit. Results for some of the 
time-limits employed are presented in Table 1 and graphed in Figure 
1. (The times used for Test IV differ from the other tests because 
our procedure was changed after treating that set of data.) It should 
be noted that our sample is small, so that our results are markedly 
influenced by sampling error. 

These conclusions follow: 


1. Under some circumstances, single-trial split-half estimates 
are much higher than the true coefficient. This is markedly seen in 
Tests III and IV, with short time limits. Split-half single-trial esti- 
mates cannot be usefully interpreted, unless they are also accompanied 
by evidence that the degrees of spuriousness is negligible. 














LEE J. CRONBACH ‘AND W. G. WARRINGTON 179 


2. The spuriousness declines as the time is extended, and is 
negligible even when some students have not finished the test. 


3. For the present tests, spuriousness is small when the time- 
limit allows an average of 30-40 seconds or more per item. 


4. For Tests I and II, the single-trial estimate is not markedly 
spurious even when very short time-limits are imposed. 


The Kuder-Richardson formulas have been used by some test 
authors in the belief that these formulas lead to underestimates and 
so will not give spuriously high coefficients. But the factors that 
cause a split-half coefficient to be spuriously high also operate in the 
Kuder-Richardson formulas, and in fact, « is the mean of all possible 
split-half coefficients for the given test (5). When the Kuder-Rich- 
ardson formulas were applied in the usual manner to the Tate tests, 
results shown in Table 2 were obtained. 


TABLE 1 


True and Spurious Coefficients of Equivalence 
with Varying Time Limits 


Time (seconds) 200 400 800 1600 2400 3200 4800 

















Test I t, .795 .846 893 .918 915 874 .866 
Arithmetic 
Reasoning v; -790 .810 .889 .892 .904 .876 857 
Test II ’, .906 895 .856 875 .888 .888 .881 
Number 
Series v5 -795 824 -766 .834 .894 .884 881 
Test III r, .864 913 .923 .879 .883 877 877 
Sentence 
Completion T; 574 .699 .838 .861 .883 877 877 





Time (Seconds) 600 900 1200 1500 2400 3300 4800 





Test IV i. .860 .880 .890 922 .938 .924 911 
Spatial ; 
Y; 5389 574 -756 -764 .890 911 911 


t 











PSYCHOMETRIKA 


TABLE 2 
Coefficients Estimated by the Kuder-Richardson Method 
and by Various Lower-Bound Formulas 


























Time (seconds) 200 400 800 1600 2400 3200 4800 
v; -790 810 889 892 904 876 857 
r,* te al aes _— es eae sabia 
Test I KR20 (a) 587 .732 .818 856 855 827 813 
Arithmetic KR21 .142 450 ~—-.708 800 824 791 -766 
Reasoning Ty -7.784 -2.507 -.122 609 806 828 862 
Tx. -.361 076 507 -736 839 835 862 
Tq: 177 456 671 -736 839 835 862 
Teo 583 629 ait | -800 873 852 866 
v; -795 824 .766 834 894 884 881 
fT. 077 277 342 558 823 883 881 
Test II KR20 (a) 765 -789 .806 OT -782 175 -770 
Number KR21 .135 .190 .603 .708 721 721 .718 
Series Ty -3.437 -1.819 024 621 823 886 881 
Tr —.028 .002 446 691 833 875 881 
Tx: .480 413 520 691 833 875 881 
VR 586 511 632 742 853 882 881 
r; 574 699 838 861 883 877 
To .266 314 504 830 883 877 
Test III KR20 (a) 770 826 855 .833 822 809 
Sentence KR21 387 -635 -781 811 810 811 
Completion Ty 2.774 -.233 576 827 879 877 
Tx 137 -500 718 837 879 877 
Tx: 527 714 725 837 879 877 
en 622 -792 -800 858 881 877 
Time (seconds) 600 900 1200 1800 2400 3600 4800 
v; 539 574 -756 880 890 911 911 
to 267 337 411 -705 819 902 911 
KR20 (a) 812 868 878 921 925 910 901 
Test IV KR21 492 .679 -738 863 891 882 871 
Spatial Ty -1.094 -.021 337 -753 856 905 911 
Tr .261 509 -606 821 878 907 911 
Tx: 550 690 -730 821 878 907 911 
Ter 669 754 782 869 901 918 911 





*These values were not computed because items in Test I are not arranged in approximate order 


of difficulty. 


The spuriousness is sufficiently great for Tests III and IV to 
demonstrate that a is no more defensible as a single-trial estimate 


than is the split-half method. The KR-21 coefficient is lower than the 














LEE J. CRONBACH AND W. G. WARRINGTON 181 


TIME (SECONDS) 












































100 400 800 1200 1600 2400 3200 4000 4800 
T T T T T T T T T T l 
Perens gies oy peer: oo Or ae 
" wine 
} 4 ya 
Ww 60 
° 
i 
ui 40 vf 
° a TEST 1! 
20 / ARITHMETIC 
REASONING 
00 
L00 400 800 1200 1600 2400 3200 4000 4800 
T T T | T T T | T T T 1 
i Ss caeaaeee 
80 + sogporee” 
b A 
Zz 
W609 
oO 
: 
us 40 = 
° Fa TEST 
20 / NUMBER 
Res / SERIES 
~ 
00 ~“/ 
100 400 800 1200 1600 2400 3200 
T T T T T T T T T T ] 
oS 
80+ 
i 
= 
= 60 bo 
. / 
uj 40 py 
° / TEST 
20+ SENTENCE 
COMPLETION 
00 - 
100 600 900 1200 1800 2400 3000 3600 4200 4800 
T T T ‘Ss i | | T ] 
Se ee ee cae il 
= 
2 
W.60 |- 
° 
w 
4-40 Le 
° TEST IZ 
20+ SPATIAL 
00 - 
7 ce ite A ee ee ee 








FIGURE 1 
True and Spurious Coefficients, with Gulliksen Lower Bounds, as a 
Function of Time Allowed. 











182 PSYCHOMETRIKA 


true coefficient for all tests and for all degrees of speeding, with 
minor exceptions in Test IV. KR-21 is not to be recommended, how- 
ever. We have been able to construct tests of quite usual tpyes for 
which KR-21 coefficients are spuriously high. The good results ob- 
tained with Tate’s data are therefore to be dismissed as a coincidence. 

Table 2 also presents results from application of several lower 
bounds formulas: r., 7u, 7x, 7x, and ’x-. In practical test analy- 
sis, samples much larger than 36 cases are required for use of 7, 
Tx, and 7x. Tr, could not be applied to Test I because items were 
not arranged in order of difficulty. Values of ry and ry are shown 
in Figure 1, and values of r, in Figure 2. 

Values of 7s are not included in Table 2, but they are plotted in 
Figure 1. rs is always less than ry, unless few items are finished. 

As expected, 75, Yu, Yu , and 7, remain below 7;. They do there- 
fore serve as lower bounds. 7x is a lower bound throughout the range, 
but in Test IV, rx, and rx, exceed 7;. This is very likely a conse- 
quence of using these formulas involving highly unstable parameters 
with so small a sample. 

The spurious split-half coefficient always serves as an upper 
bound for 7; (disregarding the trivial instances where 7; is slightly 
the larger, presumably because of sampling error). Wherever one 
of the lower bounds comes close to 7, , 7; is determined to fall within 
a narrow range of values. 7s; comes close to r, only when nearly 
everyone has finished the test. ry comes fairly close to r, over a con- 
siderable range of time limits, and so does r.. For short time limits, 
the lower bounds are so far from the upper bounds that 7; cannot be 
estimated with useful accuracy. 

If ry (or ry or 7-) is close to 7, , 7; can be inferred. 7y is easier 
to determine than 7,, and will generally be the preferred formula, 
if we accept Gulliksen’s assumption. However, ry and r, are based 
on different formulations of the problem, and for some tests they 
will be expected to give substantially different results. 7. takes into 
account the difficulty of the items being finished as time ends. 7, will 
therefore ordinarily be higher than ry in tests where subjects have 
reached items which are quite difficult for them, and where there is 
considerable range in number finished. Whatever lower-bounds for- 
mula is used, if the lower-bound does not fall close to the upper 
bound, 7; cannot be estimated by a single-trial method with any use- 
ful degree of accuracy. It does not help, for example, to know that 
the lower bound is near .30 and the upper bound near .90. In test II, 
200 seconds, with these bounds, 7; is .80; in Test IV, with similar 
bounds, 7; is only .54. 














LEE J. CRONBACH AND W. G. WARRINGTON 183 


TIME (SECONDS) 






































100,400 800 1200 1600 _ eh. 4800 
80-L OE 
=a 
2 ye 
w 60 - 7 
re) “ea 
re / 
us 40}- Fd 
: ee: eg 
mm FL SNS SERIES 
Pad 
ool / \i 
100 aqpo.___2800. ato __ 4800 
BO 
> 
wi .60}- 
°o 
te 
ty 40/- 
° TEST I 
20 SENTENCE 
COMPLETION 
ool 
“ 600 900 1200 1800-2400 ~=— 3000 ~=— 3600 ~—S4200——4800 
‘ine ees een | T l l 
sob —. 
Ke 
2 
w.60t 
°o 
5 
40 
8 TEST IZ 
20b SPATIAL 
oo 
; — 


FIGURE 2 
True Split-Half Coefficient with the Lower Bound 7,. 


We evaluate the formulas as follows: 


rg: An extreme lower bound, easily computed. May be used as 
a rapid check to demonstrate that speeding is negligible. In case 7 
falls much below r, , another lower-bound formula should be used. 

Yu, Ty’: Gives empirical results similar to 7, , and is much more 
readily computed. Involves an inadequately tested assumption, but 








184 PSYCHOMETRIKA 


the probable soundness of the assumption justifies tentative use of 
Ty and ry: . 

r-: Gives satisfactory empirical results, and can be used with 
r, to locate 7; approximately. 

x, Tx, and rg: Values from these formulas, together with 
r,, delimit 7; more narrowly than ry or r,. Adequate empirical trial 
impossible on present small sample. Formulas involve an assump- 
tion not yet adequately explored. 

The test author is advised to compute 7, together with ry, Ty , 
or r,. If the lower bound is close to r,, 7; can be inferred with ade- 
quate accuracy. 


A Note on Degree of Speeding 

Test theory will be clarified if we can define and measure degree 
of speeding. Then the false dichotomy between speeded and unspeed- 
ed tests can be discarded. A test is completely unspeeded when no 
subject’s standing would be altered if he were given additional time. 
Speeding is introduced if the time limit alters, not his score, but his 
true standard score in the group. A hint in an early study by Tinker 
and Paterson (13) led us to the index we call -. 


TA,B, * A,B, 

iii T4,B, ° TAB, seid 
A and B are equivalent forms of the test, and the subscripts ¢ and p 
indicate scores under time-limit and power conditions respectively. 
In our study 7 is estimated by correlating independently administered 
half-tests, the power condition being taken as the performance when 
all students had attempted all items. This index shows what pro- 
portion of the reliable variance in the score obtained with a given 
time limit reflects the same factor as the test does when given under 
unspeeded conditions. If 7 is .90, ninety per cent of the true-score 
variance represents a speed factor, and only ten per cent represents 
whatever altitude factor is involved in responding to the test items. 
Applied to Tate’s tests, + gives results shown in Table 3. In 
addition to the tabled data, results were also obtained for about an 

equal number of intermediate points. These conclusions follow: 


1. The score variance due to speeding may be negligible even 
though many students have not finished. An index similar to 7 is 
required for rigorous thinking about degree of speeding. 


2. Insofar as these data are representative, speeding is very 




















LEE J. CRONBACH AND W. G. WARRINGTON 185 


, TABLE 3 
Degree of Speeding at Various Time Limits 
Time (seconds) 




















Test Statistic 200 400 800 1600 2400 3200 4800 
M, 9 1535 a3 65 60 64 
I M, 6 10 15 .s ss: ® 8 
—— » 29 48 76 109 99 65 16 
Reasoning 8, 25 38 62 88 91 - 85 8&3 
Tau ee a, ae ee .: a 
M, 13 20s 81 48 «57SiCBst«iO 
II M, 1220 17s 88 . 2 & @ 
»narea 8, 35 44 13 7 48 28 ‘0 
Series 
, 88 88 59. 70. 70...69 69 
Tau 19 69 61 29 28 - 0 
M, s ss & 56 COO — 
Ill M, 10 18 28 7 8s ws = 
Ps s, cy es a ns ee cr — 
ompletion =, 36 58 80 ao. 2 ae «= 
Tau 49 45 2 04 0 0 - 
Time: 600 900 1200 1500 2400 3300 4800 
M, 19 25 32 37 50 58 60 
IV M, 14 19 28 27 8 38 39 
Syatiol 8, 46 57 6.2 oc 2 =e 6 
S, 46 62 72 8&7 109 105 97 
Tau , oe ee ee ee ee 





f = number finished 
x“ = number right 


slight with time limits averaging one minute per item for arithmetic 
reasoning and number series; or with time limits averaging thirty 
seconds per item for sentence completion and spatial items. 

3. When ; is less than .05, the difference between 7, and 7; is 
negligible. If these empirical results are dependable, the single-trial 
coefficients are dependable for tests having 7+ .05 or less. 





M. 
Gulliksen draws attention to the ratio ~, This is also an in- 

Sez 
dex of speeding, which lacks the rational basis of but has the advan- 
tage of ease of computation. For the Tate tests, 7 and the Gulliksen 
ratio give similar results, but in the long run the ratio is not especial- 








186 PSYCHOMETRIKA 


ly useful because it makes no allowance for the difficulty of the items 
left unmarked. 


General Recommendations 


Our studies lead to these suggestions: 

1. Split-half and Kuder-Richardson methods should not be used 
save to get an upper bound r, for the coefficient of equivalence of a 
speeded test. 


2. Demonstrating that either of two conditions (see text) holds 
for the test in question is a sufficient condition for assuming that the 
true coefficient is close to this upper bound. ‘9 


: 3. Computing the lower bound 7; is useful. 7; is known to fall 
between. r, and r.. If. these are close together, 7; is usefully de- 
limited. Instead of r,, other lower bounds developed by Gulliksen’s 
method may be advantageous, but they cannot be finally accepted be- 
cause of the assumptions involved in the present derivations. 


When a single-trial method is used, we can obtain only a ‘coeffi- 
cient of equivalence, showing how two samples of behavior taken 
while the person is operating under the same conditions would com- 
pare. Rates of work are probably unstable, and no single-trial co- 
efficient can reveal how much of the “reliable” variance is due to 
temporary sets. 

, The analysis of Tate’s dni points to a recommendation for test- 
ing and test research in general. Both our analysis and Tate’s show 
that variance on a speeded test contains separate portions attribut- 
able to speed and altitude. This conception was also developed earlier 
by Baxter (2) and Davidson and Carroll (8). This implies that 
degree of speeding may be an important characteristic of a test to be 
reported in test manuals (cf. 4). + can be estimated fairly easily, 
provided two comparable forms can be administered. Knowing the 
degree of speeding would help significantly in interpreting a test. 

In factorial studies and similar research, batteries should be de- 
signed in which speed factors would emerge along with altitude fac- 
tors, as in the Davidson-Carroll study. The common practice of using 
substantially speeded tests in factorial batteries has probably caused 
us to ignore differences in factorial composition between, for exam- 
ple, speeded and unspeeded verbal tests. Factorial batteries designed 
to investigate relations between tests having various degrees of speed- 
ing would determine to what extent the speed variance represents the 
same factor in tests having different content, and to what extent the 








LEE J. CRONBACH AND W. G. WARRINGTON 187 


speed variance represents ability rather than response set. Defini- 
tion and measurement of the speed factor or factors would then lead 
naturally to a consideration of the role of speeding in test validity 
for various predictions. 

The test author has responsibility for making available depend- 
able information on the accuracy of his test. When a test is given 
with a time limit, there is no excuse for reporting a split-half co- 
efficient alone. But the author may be able to report data, based on 
only a single administration of his test, which will satisfy the de- 
mand of the test consumer. If he can demonstrate that his test is 
only slightly speeded, or if he can establish the lower bound for the 
reliability by one of the formulas we have discussed (7s, the most 
easily computed; 7,; or one of the Gulliksen bounds), he can report 
these single-trial estimates and need not use a two-trial method. But 
if the lower-bounds formulas give such low coefficients as to cast 
doubt on the usefulness of the test, two experimentally independent 
trials should be used to obtain a more exact coefficient of equivalence. 


REFERENCES 


1. Adkins, D. C. Construction and analysis of achievement tests. Washington, 
D. C.: U. S. Government Printing Office, 1947. 

2. Baxter, B. An experimental analysis of the contribution of speed and level 
in an intelligence test. J. educ. Psychol., 1941, 32, 285-296. 

3. Buros, O. K. (Ed.) The third mental measurements yearbook. New Bruns- 
wick: Rutgers University Press, 1949. 

4, Conrad, H. Information which should be provided by test publishers and 
testing agencies on the validity and use of their tests. Proceedings, 1949 
Invitational Conference on Testing Problems, pp. 63-68. Princeton: Educa- 
tional Testing Service, 1950. 

5. Cronbach, L. J. Coefficient “alpha” and the internal structure of tests. To 
be published. 

6. Cronbach, L. J. Essentials of psychological testing. New York: Harper 
and Brothers, 1949. 

7. Cronbach, L. J. Test “reliability”: its meaning and determination. Psycho- 
metrika, 1947, 12, 1-16. 

8. Davidson, W. M., and Carroll, J. B. Speed and level components in time- 
limit scores, a factor analysis. Educ. psychol. Meas., 1945, 5, 411-427. 

9. Guilford, J. P. Fundamental statistics in psychology and education. New 
York: McGraw-Hill Book Co., 1950. 

10. Gulliksen, H. The reliability of speeded tests. Psychometrika, 1950, 15, 259- 
269. 

11. Guttman, L. A basis for analyzing test-retest reliability. Psychometrika, 
1945, 10, 255-282, 

12. Iowa Silent Reading Test, Manual. Yonkers: World Book Co., 1948. 





PSYCHOMETRIKA 


Paterson, D. G., and Tinker, M. A. Time-limit vs. work-limit methods, 
Amer. J. Psychol., 1980, 42, 101-104. 
Tate, M. W. Individual differences in speed of response in mental test ma- 
terials of varying degrees of difficulty. Educ. psychol. Meas., 1948, 8, 353- 
374. 

15. Thurstone, T. G., and Thurstone, L. L. Mechanical aptitude III: Descrip- 
tion of group tests. Psychometric Laboratory Reports, No. 55, 1949. 

16. Thorndike, R. L. Personnel selection. New York: John Wiley and Sons, 


Inc., 1949. 


Manuscript received 8/10/50 
Revised manuscript received 1/12/51 











PSYCHOMETRIKA—VOL. 16, NO. 2 
JUNE, 1951 




































OPTIMAL TEST LENGTH FOR MAXIMUM 
BATTERY VALIDITY 


PAUL HORST 
EDUCATIONAL TESTING SERVICE 


Having given a fixed amount of total testing time it is impor- 
tant to know how long each test in the battery should be so that the 
correlation of the battery with the criterion will be a maximum. 
The precise solution for the test lengths will depend on a particular 
set of conditions which may be specified. The writer has previously 
presented solutions for two sets of conditions. This article presents 
the solution for a third set of conditions. These are: (1) The total 
number of items or testing time is fixed. (2) The score is the total 
number of items correctly answered. (3) The test lengths are de- 
termined in such a way that the correlation of total score with the 
criterion is a maximum. The solutions for the two previous sets of 
conditions, together with the current set, are summarized. A set of 
experimental data is submitted to each solution and the three sets 
of results are compared. 


Suppose a battery of tests has been administered to an appro- 
priate sample of persons on whom criterion measures are also avail- 
able. We specify that a person’s score on each test be simply the total 
number of items answered correctly. We assume now that the fol- 
lowing data are available: 


The number of items in each test. 
The variance of each test. 

The reliability of each test. 

The validity of each test. 

5. The intercorrelations of the tests. 


Now if the lengths of the tests are altered, all of the statistics 
enumerated above will also be altered. In general we may assume 
the lengths of the tests to be somewhat arbitrary and we may wish 
to establish a rational basis for altering these lengths. Several ra- 
tional bases may be suggested as follows: 


1. The lengths of the tests should be such that the raw score 
multiple regression weights for predicting the criterion 


~~? 


189 














PSYCHOMETRIKA 


measures are all equal. This means that the total number 
of items correct in all the tests would give the best least- 
squares prediction of the criterion. 


2. Having specified a fixed number of total items for all tests, 
the lengths of the tests should be such that the multiple cor- 
relation of the tests with the criterion is a maximum. Here 
we make no restrictions as to the values of the regression 
weights and therefore we could get a higher multiple corre- 
lation than if the weights were all required to be equal. 


The solution for determining the test lengths on the basis of unit 
regression weights has been presented elsewhere (1). The problem 
of determining the test lengths on the basis of maximum multiple 
correlation for fixed testing time has also been solved (2). 


A third basis for altering test length has been proposed by Pro- 
fessor Harold Gulliksen, as follows. Having specified that (1) the to- 
tal number of items desired is a fixed value, and (2) the score shall be 
the total number of items correct, let us alter the lengths of the origi- 
nal tests in such a way that the correlation of total score with the 
criterion shall be a maximum. At first it appears that Professor Gul- 
liksen’s conditions are the same as those given in the first rationale, 
namely, that the raw score regression weights should all be equal. 
However, it can be shown that the set of conditions proposed by 
Professor Gulliksen lead to a different solution. This we shall pro- 
ceed to show. First, however, we shall indicate the solutions for the 
test lengths which satisfy the other two sets of conditions. For the 
sake of uniformity of treatment, these solutions will be presented 
in somewhat different form than in the references cited. For all 
three sets of conditions we specify that in altering test lengths we 
do not change the nature of the tests and hence preserve the follow- 
ing conditions: 


The average item variance in the test remains unchanged. 


2. The average inter-item covariance for the items within the 
test remains unchanged. 


3. The average inter-item covariance between the items in one 
test and those in another test remains unchanged. 


The average item-criterion covariance remains unchanged. 




















PAUL HORST 


a; =the original length of test 7, 

b; =the altered length of test 7, 

o; =the standard deviation of test 7, 

vi; =the reliability of test 7, 

vi; =the correlation of test i with test 7 , and 
Tic — the correlation of test i with the criterion. 


To present the solutions for each of the three sets of conditions, 

we begin with the following four sets of simultaneous equations: 
TiiBi + Ti2Be + ++ + TinBn = Vie 
Ti2ps + Toohe acai TonBn = lee 


Pua fae oe © tak os (1) 
TinBx + TonBn + eee + T nnn — Tre 

Pisty + Trate + ++ + Pinks = 0; (1 — 111) 

Trot, + Tote + +++ + Ponta = 02 (1 — 122) 

Sy ae ey eae a (2) 
Tints + Tonte a a Tanta — on(1 veo Tnn) 

T11Vi + Tr2Vo + 0+ + TinVen = Va(1 —Pr3) 

T12V1 + TeoVeo + +++ + TonVn = Ve (1 — Too) (3) 
TinVi +tu¥s a Tua¥ « — Va, (1 — fan) 

T1181 + Tree + ee) + TinSn = Gi/o1 

1125 + TS: + —— + Tey = Ae/o2 (4) 


TinS1 + TonS2 oe Tole — An/ On 


The unknowns in equations (1) to (4) are the f’s, ?’s, V’s, and 
S’s respectively. These unknowns have no special meaning other than 
in calculations. All of these sets of equations are similar in that the 
coefficients of the unknowns are the correlation coefficients among 
the test variables. The reliabilities of the tests are the coefficients in 
the diagonal positions. The right hand sides of the equations are all 
experimentally determined values but are different from one set of 
equations to the next. Since for each set of equations we have n 
equations in m unknowns, we can solve for the sets of unknowns by 
any one of a number of methods. Let us assume, then, that all of 
the f’s, t’s, V’s, and S’s have been solved for. We shall now show 
for each of the three sets of conditions how the altered test lengths, 
b;, may be solved for in terms of these values and the appropriate 
values on the right hand sides of the equations. The proofs of the 





192 PSYCHOMETRIKA 


solutions for the first two sets of conditions have been given in the 
references cited (1 and 2). The proof for the solution for the third 
set of conditions will follow in this article. The solutions are as fol- 


lows. 
Condition I. The tests shall be altered in length so that when the 


score on each test is total number of items correct, then the raw 
score regression weights shall all be equal. The new test lengths b 


are then* 


ay 
b, —— (Cp, —t,) ’ 


OC; 











Qe 
b, =— (Cp. — te) ’ 
02 (5) 
An 
b,=—— (CB, —t,) , J 
on 
where 
ait; 
pata 
oi 
C= ‘ (5a) 
aif; 
vi 
The multiple correlation in this case is given by 
Qipi 
(Stine) (S— ] 
Gi 
R? = ZBiT ie — ’ (6) 
ait; 
2b, + >— 
G5 


where the multiple correlation we would get if all tests were perfectly 
reliable is given by the first term on the right of (6) namely, S£:ric. 


Condition II. The lengths of the tests shall be so altered that if the 
total length of the battery is fixed, the multiple correlation of the 
tests with the criterion shall be a maximum irrespective of the re- 
gression weights. Then the lengths of the tests will be 


These 6’s are not to be confused with raw score regression weights. 








PAUL HORST 193 


b,= Va,(1—1711) (LA, — Vi), 








bz = V/ de (1 — 722) (Lp. — V2), (7) 
ba = Va, (1 a a0 Ten) (LB, ey Va) ’ 
where 
s= DV; Vai (1 — ris) 
paz =V. ‘alacoat r ) (7a) 
ahi Vai (1 — ris) 
The multiple correlation for this case is given by 
eVi)? 
R= Shire (S7icVis) (8) 


Db; '+ SV: Vas (1 — ris) 
In this case the a’s and b’s may be used to indicate testing time in- 
stead of number of items if we prefer. 


Condition III. The lengths of the tests shall be so altered that if the 
total length of the battery is fixed and the score is the total number 
of items correct, the correlation of total score with the criterion shall 
be a maximum. This correlation must be equal to or less than for 
Case II since the weights are all equal. Then the lengths of the tests 
will be 











ay 
b,=— | Gat +KS, | ? 
b= | Bate + KS. | (9) 
ee. | 8. — i +KS, | é' 
on 
where 
{ tiai \ 2 Sats \ 
‘Poeees ) _[za-rdew |( 2) 
c= (9a) 


(srs ~ (rut )( > =< ) 


and 








194 PSYCHOMETRIKA 


ti 
be — Ti) oil; (S7r:S8i) — cart 255; + a 


Ci 


t;Qi Sidi 
(m2 )(>)-(>)%) 


In this case we do not get a multiple correlation in the strictest 
sense of the word. The correlation of total score with the criterion 
is given, however, by a rather complicated equation as follows. We 
let 


An = rif; Aye = DSricli , Ars=DricVi, Ars = >riSi, 








i ae rns ai 
Ag. = Stioi(1 —7rii) , Aves = Sti Vai(1—7rii), An =Dti—, 


Ci 








ee ee a 
Ass = SV; Vas =7a) Au =SVi—, — 
Ci 
a; 
Ay = DSi —. 
oj 
Then 
AyAx2Ass + 2A12Ai3 (Ags + z= bi) 
is [Ai (Acs 2 2>5;)? . A22A3? + A33A,3"] 
Rk? = ' (12) 


AnAsgs diel (Aes a 2>5;)? 


It will be noted that equations (9) which give the lengths of the tests 
for the third set of conditions are not as simple as those for the first 
two sets of conditions. The b’s for the first two sets require the so- 
lutions for only two sets of unknowns, while those for the third set 
require solutions for three sets of unknowns. Furthermore, two con- 
stants must be determined for conditions III, whereas only one is 
required for I and II. Also the solution for the constants G and K 
are considerably more complicated than for C and L. For all three 
sets of conditions, the constants are functions of Sb; or the over-all 
specified length of the battery. 

However, the same set of coefficients are used in solving for all 
four sets of unknowns, and only the constants on the right are differ- 
ent. If the Doolittle method for solving normal equations or some 
variant of it is used, then the forward solutions will all be identical 











PAUL HORST 195 


except for the constant columns and only the back solutions are dis- 
tinctly different. But the back solutions are accomplished much more 
rapidly than the forward solutions. To illustrate the three different 
methods, we take data from tests administered for experimental pur- 
poses to 210 first-year students at Ohio State University Law School 
in the Fall of 1948. The tests are entitled 


1. Verbal Analogies 
2. Best Arguments 
3. Practical Judgment 


The data are as follows: 








Intercorrelations, 7; ; Validity, ric Reliability, 7;; 
1 2 3 
1 34 01 .30 78 
2 -.05 .26 52 
3 15 12 
Gi 4.9 4.5 3.1 
Qi 3 10 10 








R?= .144. 


Using these data in equations (1) through (4) we get 


pb, = .122 t,= 2.603 V,=-3.301 S; = -3.885 
b2= .562 t2= 8.392 72= 9.132 S.= 9.811 
B; = 1.472 t, = 26.403 Vs = 28.761 S; = 31.254 


We shall also need the values given by equations (11). These are 


A,, = .404 A= 5.364 Aiz;= 5.701 Au= 6.074 
Aco = 87.396 Azz = 94.687 Az, = 102.326 

A33 = 102.745 Ags, = 111.157 

Aw = 120.361 


Assuming now we do not wish to change the total test length we have 
dSa=>dbd—23 .* 


For condition I we have 


*This value is given in 5-minute units to facilitate computation. The total 
time is actually 115 minutes. 








196 PSYCHOMETRIKA 





23 + 102.32 
= —————- = 20.612. Eqs. (5a) & (11) 
6.08 
b,= .612( .122K + 2.60) = 3.13 
b. = 2.222( .562K— 8.39) = 7.10 Kags. (5) 
b, = 3.226 (1.472K — 26.40) = 12.71. 
5.36 X 6.08 
2—= 404 —_—_—_—_—_—-= 144. Eas. (6) & (11) 
23 :+ 102.32 
For condition II we have 


23 + 102.75 
«50 


= 22.061. Eqs. (7a) & (11) 





b,= .812 ( .122L + 3.30)= 4.86 
b, = 2.191 ( .562L— 9.13) = 7.16 Eqs. (7) 
b, = 2.966 (1.472L — 28.76) = 11.02. 
5.36 X 6.08 
R? = .404 — ——————-= ..146._ Eas. (8) & (11) 
23 + 102.32 
For condition III we have 
87.396 X 120.361 — (148.322)? 
== 22.463. 





216.074 X 148.322 — 120.361 X 5.362] 
Eqs. (9a) & (11) 
87.396 X 6.074 — 5.362 X 148.322 


K= =—.5175. 
2[6.074 X 148.322 — 120.361 X 5.362] 
Eqs. (10) & (11) 


b,= .612 [ .122G—4(-2.603) — K(-8.885)] = 3.70 
b, = 2,222 [ .562G—4( 8.392) —K( 9.811)]= 7.44 
b, = 8.226 [1.472G — 4 (26.403) — K (31.254) ] = 11.85. 
R= 


404 X 87.369 X 120.361) + (2 X 5.362 X 6.074 X 148.322) 
- — 404 (148.322)? — 87.396 X (6.074)? — 120.361 X (5.362) | 





Eqs. (9) 








= .145. 


Eqs. (12) & (11) 


It will be noted that for all three conditions the rank order of 
the optimal times is the same. The R*’s for all three differ only in 


87.396 X 120.361 — (148.322)? 











PAUL HORST 197 


the third decimal place. As-a matter of fact, the optimal test lengths 
for the three conditions do not vary greatly from the original of 3, 
10, 10, which means that the original lengths were not far from opti- 
mal.. This accounts for the fact that the original R* of .144 is close 
to the other R?’s. 

It should be noted that although the three conditions give nearly 
the same values for R?, condition I is lowest of the three and condi- 
tion II highest. This is according to theory. 

Both cases I and III would be especially appropriate in the large 
scale use of IBM scoring machines by numerous operators working 
under unknown conditions of supervision. Case III gives the theo- 
retically correct solution. It is probable that in many cases, however, 
as in the above example, the differences in validity for all three cases 
may not be significant. The R’s and the b’s for cases I and II are 
considerably easier to calculate than for case III. Case II can not 
be less than either I or II, hence may be regarded as an upper limit 
for case III. Case I would be a lower limit. In general, then, if one 
were determining optimal lengths for a battery of unit weights, he 
would calculate the R’s for cases I and II. If they did not differ 
significantly, he would calculate the b’s for case I, knowing that the 
results would not be significantly poorer than for case III, which 
requires considerably more computation. 

If, however, the R for case II should be significantly higher than 
that for case I, it would probably be best to use equations (9) for 
calculating the test lengths. 

If the problem of computing weighted composite scores is not 
regarded as crucial then, in general, the case II method should be 
used and test lengths computed by equations (7). 


Appendix: Proof of the Solution for Condition III 
We let 


D, =a diagonal matrix whose elements are the lengths of 
the original tests, 

D, =a diagonal matrix whose elements are the lengths of 
the altered test, 

Do, =a diagonal matrix of the standard deviations of the 
original tests, 

P, =the matrix covariances of the original tests, 

P,-< =the column vector of validity covariances of the original 
tests, 





198 PSYCHOMETRIKA 


P, . =the matrix of covariances of the altered tests, 
P,. =the column vector of validity covariances of the al- 
tered tests, 
D,, =a diagonal matrix whose elements are one less the test 
reliabilities, and 
R, =the correlation between a criterion and a score on a 
battery of tests all of which have unit weight. 
It can readily be shown that 
l’ Pre P’ve 1 
R,? = be be , (1) 
1’ P,1 
where 1 is a column vector all of whose elements are unity. It can also 
be proved that 


P,=D, Do? PDD, —D?DD+DD, (2) 
P,. = D, D5* Fiaee (3) 

where 
D=D, D,* De®. (4) 


Substituting (4) in (2), we have 


P; = D, D, P, D, D; a D D; Dao, dD, Da, D, D, + D; D. dD, Do?. 
(5) 


Writing (1) for the tests of altered length from (3) and (5) yields 


R,? = 
1’ D, D;? Pe * ig D;? D, 1 


1’[D, p= P. D7; D, a D, D;? Dz, Dy De Di Dy + D, Do" dD, Do?) 1 





(6) 
We write D,1 = V,, 
Pac= Dao, Tacs (7) 
P.= Da, Ta Do, (8) 
and substitute (7) and (8) in (6) to get 
V, D;* Dz, Tac on Dz, D> Vy 
R,? = (9) 





V's De® Do, (7a — Du) Do, De® Vo + V's Du-Do,? Di 1 


Dropping the second subscript on Do, and dropping the subscript of 
T., we let 


y=De DV». (10) 











Tha 











PAUL HORST 199 


We substitute (10) in:(9) and. get 
Ton V oe wn 3 
R,? = y ad Y ' (11) 
y(t —Dy)¥,+ y' Dy Dol 





From (10), “i 
i Die= Vo. (12) 
_“t;Now we wish to determine ‘y 80 as to maximize (11) with the 


condition that the sum of the b’s is some specified value. We write 
(11) ti. voy 12 QP. pure 


ryailtrs 1° 6 (18) 
LV Ve=V' Det Doy=fe wo 26% Vr eceae @ YS) 


From (13) and (14). we write 
ptt oo SS ae 


where 4 is the Lagrangian multiplier. .Taking. differentials of: both 
sides of (15), we find 


1 fa | * “os a 
ee (16) 


2 ae 


From (11), (13), (14), and (16), we get 


0 0¢ \’ 
2] oy’ Oy 





_ (17) 
2 tof —o |r— Dor + 4D, Do | + ” 4 D.De" | ° 
We let 
Yac = Ui (18) 
Do D,1= U2 (19) 
o? D,1=U; (20) 
(r—D.) =p. (21) 


Substituting equations (18) through (21) in (17), equating to 
zero, and solving for y, we get: 





200 PSYCHOMETRIKA 





y = p? (GU, — 4U, + KUs) (22) 
where 
g=4 d (28 
“ 
oo. (24) 
2h, 


From (12) and (22) we have 
V,=D." D, p? (GU, — 4U, + KUs), (25) 


which gives the formal solution for the b’s. However, we still have 
the unknowns G and K to solve for. We let 


U'; p? U; =U'; p?} U; = Ai. (26) 
From (11), (13), and (18), 
fi=U1y. (27) 
From (11), (13) and (19), and (21), 
feo=y' py t+ U2. (28) 
From (14) and (20), 
VV,=U;y=T. (29) 
From (22), (26), and (27), 
ft= GAy, — 4Ar + KAss. (30) 


From (22), (26), and (28), 
fe = G(GA,, — 4A,. + KA,3) + 4(GAx: — 4Aoe + KA,3) 


+ K(GA,; — $4A23 + KAss). (31) 
From (22), (26), and (29), 
T = GA,3 — $Ao3 + KAss . (32) 
Now let 
E,=GAu —4Au + KAss. (33) 
Substituting (33) in (30), (31), and (82) respectively, we have 
fi=E,, (34) 


f-—GE,+ 4E, + KE;, (35) 








th 








PAUL HORST 201 


T=E;. (36) 
From (23) and (34), 
Gf =f2.=GEH,. (37) 
From (36), 
KT = KE,. (38) 
Subtracting (37) and (38) from (385) yields 


But from equation (33), we can write (37), (89), and (38) re- 
spectively 
fi=GAu — $Ais + KA,;, 
—2KT = GA, a 4A2, a i KAz3 ’ (40) 
T= GA,; ore 4Ao3 at KAs;; . 





Now let 
G 
H, ——.5 (41) 
f, 
1 
H, =—— ’ (42) 
2f,* 
K 
A;=-—. (43) 
f: 


With the aid of equations (41), (42), and (48), we rewrite (40) 


1= H, Aj, + He Ay + A Ais 
0 =H, Bu = is i. Biss = H, (Az; = 2T) (44) 
0 =H, A,3 + H2(As3 + 27) + Hz Az; . 


If we let 


H =avector of the H;’s, 
U =a matrix of the U;’s, i.e., (U1, U2, Us), 


eé; =a vector all of whose elements are zero but the ith, 
which is 1, and 
€;; = a3 X 8 matrix all of whose elements are zero but the 


ijth, which is 1, 
then (44) can be rewritten in matrix notation 
é, = [U' p? U + 2T (e,; + €s2:)] H (45) 








202 PSYCHOMETRIKA 


or 

H=[U' p' U + 2T (é23 + @s2) J“ 1, (46) 
which gives the solution for the H;’s. Then from (138), (23), and 
(41) 








R,? _—_-_—. (47) 
H, 
From (41) and (42), 
ff, 
=G. (48) 
2H. 
From (42) and (48), 
_ - 
=K. (49) 
2H. 
It can be shown that 
Ay =R,’, (50) 


where R,,? is the maximum possible correlation assuming all tests 
of infinite length or perfect reliability. 

The values given by (47), (48), and (49) can be readily solved 
for in terms of T and the A;; from equations (44). Using also (50), 
they are respectively 


ig Ase Ass <i 2A. A,; (Az; a 2T) 
Ft [Rn? (Aes + 2T)? a Axe A,;” + As33 ant | 
R= , (51) 
Ags A33 ox (Ags + 2T) ? 
-_ (2T + Azs)? — Age As 

2 [Ais(2T + Aes) — Ass Ars] | 

hax Ags Ais mua Ax. (2T + Ags) 
2 [Ais(2T + Ass) — Ass As] 


REFERENCES 
1. Horst, Paul. Regression weights as a function of test length. Psychometrika, 
1948, 18, 125-132. 
2. Horst, Paul. Determination of optimal test length to maximize the multiple 
correlation. Psychometrika 1949, 14, 79-88. 








(52) 





(53) 


Manuscript received 5/1/50. 
Revised manuscript received 6/7/50. 

















PSYCHOMETRIKA—VOL. 16, NO. 2 
JUNE, 1951 


REMARKS ON THE METHOD OF PAIRED COMPARISONS: 
II. THE EFFECT OF AN ABERRANT STANDARD 
DEVIATION WHEN EQUAL STANDARD 
DEVIATIONS AND EQUAL COR- 
RELATIONS ARE ASSUMED* 


FREDERICK MOSTELLER 
HARVARD UNIVERSITY 


If customary methods of solution are used on the method of 
paired comparisons for Thurstone’s Case V (assuming equal stand- 
ard deviations of sensations for each stimulus), when in fact one or 
more of the standard deviations is aberrant, all stimuli will be prop- 
erly spaced except the one with the aberrant standard deviation. A 
formula is given to show the amount of error due to the aberrant 
stimulus. 


1. Introduction. In a previous article} we showed that the or- 
dinary solution to Thurstone’s Case V of the method of paired com- 
parisons was a least-squares solution. It was also pointed out that 
for Case V it was not necessary to assume that all correlations be- 
tween stimulus sensations were zero; it was sufficient to assume the 
correlations were equal. Thurstone’s Case V assumes that all stand- 
ard deviations of stimulus sensations are equal. In this article we 
will investigate the effect of an aberrant standard deviation on the 
Case V solution. We will deal with error-free data. 


2. The Problem of the Aberrant Standard Deviation. As in the 
previous article, we suppose the objects 0,, 0., --- , 0, to have sen- 
sation means S,, S.,-+:,S,. We shall also assume 


the standard deviation of Xi =ci=o, (i=1,2,---,n—1); 
the standard deviation of X, = o,; and (1) 
the correlation pi; =p. 


In other words, the standard deviations are all equal except the one 
associated with S, , and the correlations are all equal. 


*This research was performed in the Laboratory of Social Relations under 
a grant made available to Harvard University by the RAND Corporation under 
the Department of the Air Force, Project RAND. 

+Mosteller, Frederick, Remarks on the Method of Paired Comparisons: I. 
The Least Squares Solution Assuming Equal Standard Deviations and Equal 
Correlations. Psychometrika, 1951, 16, 3-9. 


203 











204 PSYCHOMETRIKA 


Then we may define the matrix of differences between means in 
original standard deviation units as 
S;:—S; a 
Xig= ’ (1,9 =1,2,-+-,m); (2) 
Voi? + 07? — 2 pai oj 





























or 
S:—S; cas 
Xiy= PODS Se (1,j#M). 
V2e%(1—p) 
(3 
Si; 7 S, ) 
LER = ‘ 
Vo" + on? —2 poan 
There is no loss of generality and great gain in convenience if 
we define 
20°(1—p) =1. (4) 
Also 
o + on? —2 poon—ae. (5) 
Now if we write our X;; matrix we have: 
i= 3 
Xi; MATRIX 
1 2 ares n 
i S; —§, S, —S, S; — Bs (S, — Sy) /oa 
2 S.—S, S.— S- S.— Ss; (S2 — Sn) /oa 
3 S;—S, S;—S2 S;—Ss3 (S; — Sn) /oa 
n (S, — S,) /oa (S, — S2) /oa (S, — S3) /oa (Sn — Sn) /oa 





We will work out the least-squares solution much as described in the 

earlier article as if sz were unity. That is, we behave as if the stand- 

ard deviations are equal as we would if we were experimenters using 

Case V. This merely involves summing the columns and averaging. 
From this matrix the total for the ith column is 


1 
8° = 5 S;— (n—1)Si + (Ss—Si)/or, @=1,2,---, 1); 
1 
(6) 
S,"=(ES)—n5, )ou, (i=n). 





























FREDERICK MOSTELLER 205 


The S;* are essentially estimates of the least-squares solution when 

the standard deviations are in fact equal. We can perform linear 

transformations on them without changing the symbol. Because the 

result would only be good to a linear transformation, we are allowed 
n-1 

to subtract } S; from all these results, and then we temporarily set 
j=1 

S, = 0. This gives 


Si =—Si[n—1+ (1/oe)],  (6=1,2,-+-,m—1); 
(7) 


n1 
| hy = S;[ (1/02) — 1} . 
We may change the scale factor assumed in equation (4) by multi- 
plying through by 
—l 
(n—1) +1/oa’ 





and this at last gives 


Si =S; (t=1,2,---,n—1), 
(8) 


Si* = [(1—1/oea)/(n—1 + 1/oa)] 23; 
or, since S, =0, 


S,*= [n(1—1/es)/(#—1+1/os)]S8,  (Se=0). 


The gratifying part of this result is that all the S; are properly 
spaced relative to one another except S,. In other words, changing 
one of the standard deviations affects only the position of the ob- 
ject with the aberrant sensation standard deviation. We note, of 
course, that when o,? = 1, [i.e., = 2 o?(1 — p)], S,* = 0 as antici- 


pated. We also note that when the grand mean S is small, that is 
when S, is centrally located with respect to the other stimuli means, 
the effect of an aberrant stimulus is small. Thus if we have reason 
to believe that some particular object has a much different sensation 
variability from the rest, the other objects should be so chosen that 
the aberrant one is near the center of the scale, or else it should be 
excluded. 

If we suppose oz > 1, and n of reasonable size, we may approxi- 
mate S,,* by 





206 PSYCHOMETRIKA 


S.° = (1—1/0) S. 


3. Examples. 


(a) Suppose the values S,, S.,---,S, are —4,—2,0,1,2,38 
and S; has a standard deviation different from the rest. Application 
of equation (9) shows that the spacing will be correct. 


(b) With the same S values as in Example (a), suppose S, 

has og = 2. Then S,,--- , S, will be properly scaled with values 2, 

4,5,7,10 (we must add 4 to all values because we take S, = 0) and 
a 6(1—4) 28 


S," = —————_ — = 2.55 
5+4 6 


instead of zero. 


(c). With the same S values as in Example (a), suppose S, 
has og = 1/2. Then S., --- , S, will again be properly scaled as in 
Example (b), but S,* = —4 instead of zero. 


4. Generalization to Several Aberrant Standard Deviations. Al- 
though it will not be shown here, the generalization to several aber- 
rant standard deviations is immediate. If we have a set of objects 
0,,02,°:-,0,, with variances o’, 0”, 07, -++ , On-K7, On-ke17, *** » On? , then 
the standard method of solving paired comparisons, Case V, will 
leave those stimuli with equal variances appropriately spaced. Of 
course, there need to be at least three stimuli with equal variances 
for this result to be interesting or useful. 

It follows that if we have two or more sets of stimuli such that 
the standard deviations within each set are equal, each set will itself 
be properly spaced, but the sets will not be spaced or positioned 
correctly relative to one another. 

It is conceivable that in a practical situation a different method 
could be used for some of the measurements, so that we could get an 
estimate of the relative sizes of the sigmas and that this information 
could be useful in practice. 

Thurstone has already noticed that small changes in the sigmas 
do not affect the solution much. 


Manuscript received 8/22/50. 








sik 
































PSYCHOMETRIKA—VOL. 16, NO. 2 
JUNE, 1951 


REMARKS ON THE METHOD OF PAIRED COMPARISONS: 
III. A TEST OF SIGNIFICANCE FOR PAIRED COM- 
PARISONS WHEN EQUAL STANDARD DEVIA- 
TIONS AND EQUAL CORRELATIONS 
ARE ASSUMED* 


FREDERICK MOSTELLER 
HARVARD UNIVERSITY 


A test of goodness of fit is developed for Thurstone’s method 
of paired comparisons, Case V. The test involves the computation 
of 

x2 = n= (6" — 6')2/821, 


where n is the number of observations per pair, and 8” and 6’ are 
the angles obtained by applying the inverse sine transformation to 
the fitted and the observed proportions respectively. The number of 
degrees of freedom is (k—1) (k—2)/2. 


1. Introduction 

It would be useful in Thurstone’s method of paired comparisons 
to have a measure of the goodness of fit of the estimated proportions 
to the observed proportions. Ideally we might try to find estimates 
of the stimuli positions S; such that we can reproduce the observed 
proportions p’;; as closely as possible in some sense. 

One kind of test might be based on 
_ 2045 — B's)? 


2 





07; 

where p”;; is the estimate of p';; derived from the S’;. But the true 
pi; are not known and would have to be replaced by the observed 
p'i;. If one does replace the pi; by p’:; and oi; by o';; , then it is pos- 
sible to fit the S’; by means of a minimum chi-square criterion. How- 
ever, such a procedure calls for an iterative scheme and involves ex- 
tremely tedious computations. An alternative method is suggested 
by the inverse sine transformation. 


*This research was performed in the Laboratory of Social Relations under 
a grant made available to Harvard University by the RAND Corporation under 
the Department of the Air Force, Project RAND. 


207 








208 PSYCHOMETRIKA 


2. The model 


It is assumed that we have a set of stimuli which, when pre- 
sented to a subject, produce sensations. These sensations are as- 
sumed to be normally distributed, perhaps with different means. How- 
ever the standard deviations of each distribution are assumed to be 
the same, and the correlations between pairs of stimuli sensations 
are assumed equal. 

Subjects are presented with pairs of stimuli and asked to state 
which member of each pair is greater with respect to some property 
attributed to all the stimuli (the property is the dimension of the 
scale we are trying to form). Our observations consist of the pro- 
portions of times stimulus j is judged “greater than” stimulus 7. We 
call these proportions p’;; to indicate that they are observations and 
not the true proportions pj; . 

From the observed proportions we compute normal deviates X’;; 
and proceed in the usual way (5) to estimate the stimulus positions, 
S’; , on the sensation scale. Once the S’; are found we can retrace to 
get the fitted normal deviates X”;; and the fitted proportions p”;; . 

Our problem is to provide a method for ascertaining how well 
the fitted p”;; agree with the observed 7’;;. 

In such a test of significance involving goodness of fit, we are 
interested in knowing what the null hypothesis and the alternative 
hypothesis are. In the present case the null hypothesis is given by 
the model assumed above. However, the alternative hypothesis is 
quite general: merely that the null hypothesis is not correct. In par- 
ticular, the null hypothesis assumes additivity so that if Dj; is the 
distance from S; to S; and D;, is the distance from S; to S; , we should 
find 

Dix = Di; + Dix ° 


If we do not have unidimensionality this additivity property will 
usually not hold. 

For example, consider the case of three stimuli with S, < S. < S;. 
If the standard deviation of each distribution is the same, we might 
write 


D,. = S. — S, 
D,;=S8S;—S, 
D.; = S;— S2. 


Since we can choose S, = 0 and S. = D,., S; from the second equa- 
tion must be D,,;. Finally 


Des — Dy; oy Dye ° 





= 


oO © = Be & Th 


is 





FREDERICK MOSTELLER 209 


Since each of our comparisons of stimuli is done independently it is 
not necessary that this relation hold either for the observations or 
for the theoretical values. Indeed the observed value of D.; could 
have conflicted with the assumption of additivity. Such a failure of 
additivity makes the fitting of the observed p’;; less likely, and on the 
average failure will increase the value of y? in our test. 

It can also happen that the standard deviations of the various 
stimuli are not equal even though unidimensionality obtains. In this 
case our attempt to fit the data under the equal standard deviations 
assumption will sometimes fail, and this failure will be reflected, in 
general, in a failure of additivity and thus an increase in 7’. 


3. The transformation 


Like so many other good things in statistics, the inverse sine 
transformation was developed by R. A. Fisher (4). Further dis- 
cussion by Bartlett (1, 2), Eisenhart (8), and Mosteller and Tukey 
(7) may be of interest to those who wish to examine the literature. 
The facts essential to the present discussion are these: If we have an 
observed p’ arising from a binomial sample of size n from a popula- 
tion with true proportion of successes p , then 


6 =arc sin Vp" (1) 
is approximately normally distributed with variance 
821 
oo =—, (2) 
n 


nearly independent of the true p , when 6’ is measured in degrees. A 
table for making the transformation to angles has been computed by 
C. I. Bliss (3), and is readily available in G. W. Snedecor’s Statistical 
Methods (4th Edition), p. 450. 

Then if we define 


65; = are sin V/D'i; 
. (3) 
6" 5; = arc sin //p";; 
where p’;; are the observed proportions and p”;; are the proportions 
derived from fitting the S; , we can test goodness of fit by 
(0°43 — 643)? 


? =2 821/n (4) 











210 PSYCHOMETRIKA 


If there are k stimuli we have k& parameters to fit, the k S’; values. 
But two of these are the zero point and the scale factor, which are 
arbitrary. This leaves k—2 parameters free for fitting the data. 
There are k(k—1) /2 p’;,’s to be fitted. So it appears that the appro- 
priate number of degrees of freedom for the test is k(k—1)/2 — 
(k—2) — 1 = (k—1) (k—2)/2. We note that with two stimuli we 
can always fit the data perfectly, so there should be zero degrees of 
freedom as the formula indicates. 


4. Illustrative example 

To illustrate the test we will use the paired comparison method 
on the American League baseball record for 1948. The following 
table gives the observed p’;;. The number in the ith row and jth 
column is the proportion of games won by the team named at the top 
of the jth column from the team named at the left of the ith row. 
In this situation we regard the clubs as stimuli which have distribu- 
tions of performances. The number of games each club plays with 
each other club is 22 (except for minor fluctuations). Successive 
tables indicate the steps in the solution. The steps are these: 


1. From p’;; table obtain X’;; table from a table of the normal 
integral. 

Solve for the S’; by summing columns and averaging. 

Use S'; to obtain xX" i5 “ X”" i; = S'; —— S'; > 

Use X”;; to obtain p”;; , from a table of the normal integral. 
Compute 6”, 6’, 0” — 6’. 

Get the sum of squares of 60” — 6’. 

Divide the sum of squares by 821/n, here 821/22. 

Look up result in y? table with (k—1) (k—2) /2 degrees of 
freedom. 

PROPORTIONS OF ALL GAMES THAT THE TEAM GIVEN AT THE TOP 


OF THE COLUMN WON FROM THOSE AT THE LEFT (1948) 
Each Entry Represents 22 Games 


90 IH OT ym oo bo 








p’;; Table 

Clev. Bost. N.Y. Phil. Det. St.L. Wash. Chic. 
Clev. — 478 545 278 409 864 278 278 
Bost. 522 — 364 455 318 318 318 364 
N.Y. .455 .636 — .455 .409 2738 .227 278 
Phil. 727 -545 545 a 545 .182 364 278 
Det. 591 682 591 455 — 500 278 864 
St.L. .636 .682 tat .818 .500 ce -545 881 
Wash. T27 .682 -773 636 27 -455 ae 429 


Chic. 727 636 127 727 636 619 571 a 














FREDERICK MOSTELLER 




















X’,; Table 
Clev. Bost. N.Y. Phil. Det. St.L. Wash. Chic. 
Clev. —.055 +.118 —.604 —.230 —348 —.604 —.604 
Bost. +.055 —— —348 —118 — 473 —AT73 —473 —.348 
N.Y. —118 +348 —— —118 —230 —.604 —.749 —.604 
Phil. +.604 +.118 +118 —— +.118 —908 —.348 —.604 
Det. +.280 +.478 =+.280 —.113 woe .000 —.604 —.348 
St.L. +.3848 +.473 +.604 +.908 000 —— +.118 —.803 
Wash. +.604 +.473 +.749 +.848 +.604 —.113 —- _ —.179 
Chic. +.604 +.848 +.604 +.604 +.848 +4+.803 +.1799 —— 
+2.8382 +2.1738 +2.065 +0.917 +0.182 —2.143 -—2.486 —2.990 

S; .2915 .2716 .2581 1146 .0165 —.2678 —.38108 —.3738 

X",; Table 











NEY. 


Phil. 




















212 PSYCHOMETRIKA 


Table of 6”, 8’, 0” — 6’ 








Clev. Bost. N.Y. Phil. Det. St.L. Wash. Chic. 


Clev. 
45.46 
Bost. 46.26 
—.80 
45.75 45.29 
N.Y. 42.42 52.89 
+3.33 —.760 


49.02 48.56 48.27 
Phil. 58.50 47.58 47.58 
—9.48 +0.98 -+0.69 


51.24 50.83 50.48 47.24 
Det. 50.24 55.67 50.24 42.42 
+100 —4.84 +024 ++4.82 


57.54 57.10 56.79 53.67 51.47 
St.L. 52.89 55.67 58.50 64.75 45.00 
+465 +1438 —1.71 —11.08 +6.47 


58.44 58.05 57.73 54.63 52.42 45.97 
Wash. 58.50 55.67 61.55 52.89 58.50 42.42 
—06 +2388 —3.82 41.74 —6.08 +3.55 


59.80 59.41 59.08 55.98 53.85 47.41 46.43 
Chic. 58.50 52.89 58.50 58.50 52.89 51.88 49.08 
+1.30 +652 +058 -—2.52 +4096 —4.47 —2.65 


=(6” — 6’)2 = 551.40 
821/22= 37.32 


X29, == 14.78 80 < P(x?) < .90 





The chi-square result shows rather good agreement between the 
fitted data and the observed data. Investigation of additional base- 
ball data has suggested that the agreement is usually too good rather 
than not good enough. It was suggested to the author that a possible 
reason for this is that the proportion of games won by any team from 
another team involves an admixture of games played at home and 
away, and that if these were separated we might then not get such 
consistently good agreement. As an example, suppose probabilities 











FREDERICK MOSTELLER 213 


of winning at home and away are .25 and .75 respectively, averag- 
ing .50. The variance of games won based on the p = .50 is n/4, but 
based on n/2 games at .25 and n/2 at .75, the variance is 3n/16, 
somewhat smaller. The decrease in variance would be similar to 
that gained from stratified sampling. Calculations not presented here 
suggest that this may be the case. 

It should be remembered that we have found the best S’;’s in 
the least-squares sense to reproduce the X’;;’s, and have not done our 
best to reproduce the 6’s. This means that, had we done a more elab- 
orate method of fitting, we might have obtained a still better fit and 
consequently a higher value of P (which is already quite high). 


5. The power of the test for three stimuli 

The power of the test developed, that is the probability of re- 
jecting the null hypothesis when it is false, is rather awkward to 
investigate. The power depends on the degree of divergence from 
the assumptions, the number of stimuli involved, the number of ob- 
servations for each pair of stimuli, as well as the significance level 
chosen. We will discuss the power for a rather special case. This case 
has the advantage that it displays the workings of the chi-square 
test rather clearly and is easy to compute. Our procedure will be: 
(1) set up the model, (2) compute y? for this case, (3) insert a de- 
parture from the model, (4) investigate the power for the special 
case under consideration. 

We will assume that the standard deviations of the differences 
between pairs of stimuli are unity. The true stimuli means are in 
the order S; > S. > S,. Furthermore we will assume that these means 
are sufficiently close to one another that the approximation 


1 ™ 1 S;—S; 
Diy = em da = — + , (5) 








ve 1“ (Sj-84) 2 ve = 
will be adequate. For this case pi; will be nearly 1/2, so we will be 
able to use the approximation: 


1 
o* (p43) =— = o?. (6) 
4n 
Working with this case will have the further advantage that we will 


not need to use the inverse sine transformation but can work directly 
with 








214 PSYCHOMETRIKA 


> (ps5 — 9"s3)? 
i<j 


2 





, (7) 
o 


since our principal reason for working with the transformation was 
that o? was not known. 
The observations can be written 


Dp ij =Dij + hijo. (8) 


Here the unprimed p is the true proportion of the time stimulus 7 
is reported to exceed stimulus 7, the primed p is the corresponding 
observed proportion, o is 1/4n, and kj; is a random normal deviate 
with zero mean and standard deviation unity. The sample size is n 
assumed to be reasonably large. 

Under these assumptions 








" 1 S; i §34j 
D'i5 = Di; + hie =— ei dx = — + 
wa ™ —(S 5-84) +845 2 V2a 
= (9) 
1 4 S;—Si t+ kijo V2a 

2 Vin 
Thus the normal deviate corresponding to p’;; is approximately 
D'5;=S8S;—S; + kijo V2a. (10) 


Now we insert these values in the paired comparison table as usual 
and solve for the estimates of the stimuli positions S’; by summing 
columns and averaging. After adding the mean of the true stimuli 
positions these estimates are: 


S':=S,— (kw + kis) oV2 2/8, 
S'2=S.2+ (ki2— kes) 0 V2 2/8, (11) 
S’;=S;,+ (kis + kes) oV2 2/8. 
We take the differences of these pairs to get the fitted normal devi- 
ates, the D”;;: 
D" 2 = S2— 8S, + (2k + Kas — Kos) 02 2/8 ’ 
D" 13 = S3 — Si + (Kaz + 2kig + kes) o\V/2 2/8 ’ (12) 
D's = S; | 8; + (—ki2 + Kas + 2hes)oV/2 x/3 . 


Now the fitted proportions p”;; are approximately 








FREDERICK MOSTELLER 215 


1 D" 5; 
fy +——. (13) 
2 vV2a 


When we take the differences p’;; — p’i; we get 


D'12 — DP" 12 = (Ke — Ierg + hes) 0/8, 
D'1s — P13 = (—Kere + Kis — kes) 0/8 , (14) 
D'2s — DP" 23 = (Kriz — Kis + Kees) 0/8. 
Now immediate computation of y* inserting the values from equa- 
tions (14) into equation (7) is 


‘ Ke — hy, + kos . 
v= ‘ (15) 








V3 


Since the k’s are normally and independently distributed with zero 
means and unit variance, the quantity in parentheses is in turn a 
normal deviate with zero mean and unit variance, because the stand- 
ard deviation of the sum in the numerator is 3. Of course, the 
square of such a normal deviate is distributed like y? with one de- 
gree of freedom. In this special case then we have shown how the 
7° test arises. 

We have incidentally set up the machinery for examining the 
power of the test for our special case. Until now we have assumed 
that the pi; were arranged to get consistency in the spacings be- 
tween the true stimuli means. We now relax this condition. In par- 
ticular let us suppose that the consistent p.; is replaced by peo; + A 
where A is an error due to the lack of unidimensionality of the stim- 
uli we are considering. This means that p’, will be replaced by 
P'>3 + A, which in turn means that k.; will be replaced by k.; + A/c. 
Now when we come to compute 7? with the null hypothesis not satis- 
fied we get 


y= (xy + A/V3 o)?2. (16) 


Here x is a normal deviate, the expression inside the parentheses on 
the right of equation (15). If we are working with a significance 
test at the 5% level we will reject the null hypothesis unless 


—1.96 <7 + A/V30< 1.96. 


The following table indicates very roughly how often we will reject 
the null hypothesis as 4/1\/3 o takes various values. 








216 PSYCHOMETRIKA 





A/V3e Percent rejected 
rl 16% 
1.96 | 50% 
2 52% 
3 84% 


We say roughly because when A takes large values our approxima- 
tions no longer hold very well. Nevertheless these values are indica- 
tive of the magnitudes. 

Let us see how much error there must be in p.; to raise the 
rejection level to 16%. Suppose nm = 48. Then 





A? 
—=—] 
30° 
3 1 
A?# = 30? = =— 
4X48 64 
A=.125. 


Thus for samples as large as 48, p2; must deviate from the consistent 
value of approximately .5 by as much as .125 to raise the probability 
of rejection from 5% to 16%. 

A short discussion of the kinds of alternatives that can exist 
in paired comparisons and the general behavior of this test against 
these may assist the reader. The principal ways the Case V assump- 
tions can be violated are 


(1) lack of normality, 
(2) lack of unidimensionality, 


(3) failure of the equal standard deviation of differences as- 
sumption. 


Failure of normality is not important to the method of paired 
comparisons, as we shall show elsewhere. It is just as well then that 
the present test will be very poor at detecting deviations from nor- 
mality. The normality assumption is more in the nature of a compu- 
tational device than anything else. 

Lack of unidimensionality will be reflected in the failure of dis- 
tances between estimated stimuli positions to agree with the ob- 
served distances, and thus we will have high chi-square values. The 











FREDERICK MOSTELLER 217 


principal alternative of interest then, is one for which the test is 
sensitive. 

Unfortunately it is also possible that we have unidimensionality 
without having equality of standard deviations of differences of pairs. 
The result of using Case V may be to give a large chi-square value 
when this happens. This is not uniformly true however. It is pos- 
sible to have unequal standard deviations without detecting this fact 
in the Case V solution as has been shown elsewhere (6). In particu- 
lar, if there is only one aberrant standard deviation, and if the stim- 
ulus mean for that stimulus is near the mean of all the stimulus posi- 
tions, the chi-square test will not be likely to detect this failure of 
the model. The best that can be said is that sometimes such aberra- 
tions will cause high values of chi-square and sometimes not, depend- 
ing on the nature of the case. 

We might like to relax our conditions and not use Case V but try 
to use some other case. However, this requires a large number of 
stimuli. In the case of the assumption of independence between pairs 
of stimuli we still have for k stimuli a total of k means and & vari- 
ances to choose. Two of these 2k values are merely scale and location 
parameters, so we have in all 2k — 2 things that can be varied as 
against k(k—1) /2 cell entries. Thus we need at least 5 stimuli to 
begin to get degrees of freedom for testing. With a reasonable num- 
ber of stimuli we could still test for unidimensionality in the face of 
unequal stimulus variabilities. When we come to the completely gen- 
eral case, allowing the correlation coefficients to vary as well, the 
problem is hopeless. We now have more degrees of freedom at our 
disposal than there are in the table. It seems reasonable then never 
to try to test for unidimensionality under a more general assumption 
than equal correlations and unequal variances for the stimuli. 


6. Conclusions 

A test of the assumptions underlying Thurstone’s method of 
paired comparisons is developed and illustrated. The inner workings 
of the test and an indication of its power are provided for a special 
case involving three stimuli lying very close to one another. Although 
the method is developed and applied for Thurstone’s Case V, it can 
be applied to any paired comparison case providing some degrees of 
freedom are left over after the process of estimating the spacings 
between the stimuli positions has been completed. 








218 PSYCHOMETRIKA 


REFERENCES 


Bartlett, M. S. The square root transformation in analysis of variance. 
Supp. J. roy. stat. Soc., 1936, 3, 68-78. 

Bartlett, M. S. The use of transformations. Biometrics Bull., 1947, 3, 39-52. 
Bliss, C. I. Plant protection, No. 12. Leningrad, 1987. 

Fisher, R. A. On the dominance ratio. Proc. roy. soc. Edinb., 1922, 42, 321- 
841. 

Mosteller, F. Remarks on the method of paired comparisons: I. The least 
squares solution assuming equal standard deviations and equal correlations. 
Psychometrika, 1951, 16, 3-9. 

Mosteller, F. Remarks on the method of paired comparisons: II. The effect 
of an aberrant standard deviation when equal standard deviations and equal 
correlations are assumed. Psychometrika, 1951, 16, 203-206. 

Mosteller, F, and Tukey, J. W. The uses and usefulness of binomial prob- 
ability paper. J. Amer. statist. Ass., 1949, 44, 174-212. 

Statistical Research Group, Columbia University. Selected techniques of statis- 
tical analysis. New York: McGraw-Hill Book Co., 1947. 


Manuscript received 9/2/50 
Revised manuscript received 11/13/50 





ahinnce af 








PSYCHOMETRIKA—VOL. 16, NO. 2 
JUNE, 1951 


RATE OF ADDITION AS A FUNCTION OF DIFFICULTY 
AND AGE* 


JAMES E. BIRREN AND JACK BOTWINICK 


Rate of addition was studied as a function of difficulty as meas- 
ured by problem length. The hypothesis was tested that the rate 
of addition would decline as a function of the logarithm of the num- 
ber of addition operations per problem. The test material required 
the rapid addition of single columns of digits ranging from two to 
twenty-five digits in length. Rate of uncorrected addition declined 
as a power function of problem length and the rate of correct addi- 
tion declined as an exponential function of length. Results indicated 
that subjects who varied in age and mental status could be differen- 
tiated according to the parameters defining the curves of addition 
rate as a function of length. 


Introduction 

This study was designed to investigate the effect of problem 
length upon the rate of addition. The objectives were to determine 
whether: (a) a psychophysical relation exists between the rate of 
addition and difficulty as measured by problem length and (b) in- 
creasing problem difficulty results in disproportionate reduction in 
rate of addition in a class of individuals known to have altered men- 
tal function, e.g., senile mental patients. 

A group of early investigators (6, 11, 12) studied rate of addi- 
tion as a function of fatigue, drug effects, and other conditions. This 
work suggested using addition rate as the measure of performance. 
By changing the number of digits per problem it was anticipated 
that difficulty could be varied systematically and rate of performance 
could be described as a function of the number of operations com- 
prising the task. 

The ratio of change in difficulty to change in performance would 
describe a continuous function if a uniform relation exists between 
these variables. This presupposes that there exists a rational basis 
for estimating the difficulty of a task and a satisfactory measure of 
performance; both are chronic issues in psychometric research. If 

*The cooperation of the staff of the Baltimore City Schools, the cooperation 
of the staffs of the Spring Grove and the Springfield Maryland State Hospitals 
and the Sheppard and Enoch Pratt Hospital, the assistance of Miss Charlotte 


Fox in devising the test material and gathering some of the data, and the compu- 
tational assistance of Mrs. Betty Benser are gratefully acknowledged. 


219 








220 PSYCHOMETRIKA 


a method of experimental control of difficulty were available, one 
could compare stimuli from diverse fields and accordingly, compare 
the subjects’ abilities on an absolute basis (3, 4). One method of 
describing the magnitude of a stimulus is its length or number of 
composite elements. In the present study the measure of difficulty 
selected was problem length or the number of addition operations 
in a problem. If it is demonstrated that addition performance de- 
clines systematically as a function of problem length, then difficulty 
can be defined operationally by the number of addition operations. 


In such a relation the investigator must select an appropriate 
measure of performance as well as of difficulty. An adequate meas- 
ure of performance appears to lie in the use of time scores or rate 
of performance. Landahl (5) suggested the use of time scores in 
place of item scores and Thurstone (10) has also incorporated a 
time score in a conceptual relation between difficulty, time, and prob- 
ability of completing the problem. Thurstone defined the ability of 
a subject as the difficulty level at which the probability is one-half 
that the subject will complete the problem in infinite time. 

A distinct advantage of time scores is their applicability through- 
out the range of difficulty. Thus a time score gives information as 
to how well an individual handles a very simple item, i.e., how much 
time is required. In contrast a relative measure of difficulty, per cent 
passing, gives no discrimination among individuals for easy material. 
Coombs (2) has pointed out that one method of improving psycho- 
metric measurements would be to develop a method for collecting 
data which would enable us to know how well a subject passed an 
item or how badly he failed it. The use of time scores appears to 
be such a method and may be a key to developing more homogeneous 
tests. The use of the ratio of change in rate of performance to 
change in difficulty might yield “purer” psychological measurements 
than now obtainable since fewer constants would be involved. The 
investigator assumes with this approach that the subject’s standard 
of accuracy or his level of motivation is unchanged during the test 
session (10). It is not necessary for such factors to be the same from 
person to person, only that they be constant within the individual. If 
constant, their effect would be removed from the derivative of the 
function, i.e., the change in rate of performance as a function of 
change in difficulty. 

Tate (9) found evidence of a general speed factor as well as 
specific speed abilities in performance of several types of test ma- 
terial. The demonstration of specific and general speed factors offers 








JAMES E. BIRREN AND JACK BOTWINICK 221 


encouragement to the development of rational measures of difficulty. 
By the use of time scores, abilities could be analyzed in a difficulty 
range unmeasurable by item scores. 


Methods 
Materials: Pages of addition problems consisting of single columns 
of digits were prepared from series of random numbers, eliminating 
zero. On a single page only problems of one given length were used. 
Problems were prepared containing 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, and 
25 digits. The digits were typed in primer size type, eight char- 
acters per inch, to insure legibility. 


Subjects: A total of 193 high-school boys and girls was tested. They 
were in the age range of 16 to 20 years, were in the fourth year of 
high school, and were native-born white. They were selected by 
classes to yield a group of average intelligence using the results of 
the Otis Self Administering Test given by the school; the mean I.Q. 
was 104.6, the standard deviation 9.0. 


A total of 50 subjects between the ages of 60-69 years was tested. 
These subjects were convalescing patients in general hospitals, and 
residents of The Baltimore City Home for the Aged. All subjects 
were selected after individual interviews in which each subject’s 
vision was tested and his age, education, and nativity were deter- 
mined. 

A total of 33 patients institutionalized because of senile psy- 
choses, i.e., senile psychosis or psychosis with cerebral arteriosclero- 
sis, was tested. The senile patients were selected from three mental 
institutions with a combined population of about 6000 patients. As 
in the case of the normal elderly, all senile patients were native-born 
white with a minimum of four years of education. 

In individually administered tests, time to complete the page or 
a maximum time of two minutes per page was used, whichever came 
first. In group testing the timing of a page was adjusted by prelimi- 
nary study so that no individual would complete the page in the time 
limit. A time limit of one minute was used for the 2, 3, and 4 digit 
problems, and two minutes for all longer problems; thirty seconds 
rest was allowed between each page. 


Administration: High-school students were given the tests in groups 
of about 30. All aged subjects were tested individually. In all in- 
stances the subjects were told to do the problems as quickly as they 
could. Rate of uncorrected or total addition is defined as the total 








222 PSYCHOMETRIKA 


number of addition operations completed divided by the time inter- 
val. Rate of correct addition is defined as the number of operations 
in the problems added correctly divided by the time taken to com- 
plete all problems. In group testing the subjects were instructed to 
place a check marx opposite the digit they were adding when time 
was called. They were also instructed to place a direction arrow 
alongside the column indicating the direction in which they added, 
i.e., top-down or bottom-up. The digits added in the incomplete prob- 
lems were included in the computation of the rate of addition for 
the individual. The estimated time spent on the incomplete problem 
was deducted from the total time in computing the rate of correct 
addition. 

The effects of order of administration of the various lengths of 
problems were determined by giving the tests in different orders to 
two groups of 30 high-school senior boys matched for age and I.Q. 
on the Otis Self Administering Test. One group took the test in a 
constantly increasing series: 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, and 25 digits. 
The other group took the tests in the order: 2, 4, 8, 25, 3, 6, 15, 10, 
5, 9, and 7. An analysis of variance was made to test the significance 
of differences in slope and level of performance in the two groups. 
There were no significant differences in the slope or level of perform- 
ance when digits added per second was used as the measure of per- 
formance. When the logarithm of the operations was used, slight 
differences appeared which were inconsistent for the analyses of 
uncorrected addition and correct addition. In the case of uncorrected 
addition the difference of level was not significant for the two groups 
and the difference in slope was significant at the 5 per cent level. Less 
than 0.6 per cent of the sums of squares could be attributed to the 
differences in slopes between the groups. In the analysis of variance 
of correct addition, differences in slope were not significant whereas 
level was significant at the 5 per cent level. These results suggest 
that order of presentation had little if any effect upon the perform- 
ance of the subjects. 


Scoring: Individual tests were scored both as operations per second 
and correct operations per second. The results of each individual 
were graphed in duplicate. Two investigators independently fitted 
straight lines to the graphs. Results were plotted as log operations 
per second against log operations in the problem and as log log cor- 
rect operations per second against operations in the problem. In fit- 
ting the straight lines the middle portion of the curve was given 
emphasis because of the limitation of performance at low difficulty 








JAMES E. BIRREN AND JACK BOTWINICK 223 


due to writing speed and the lack of reliability at high difficulty in 
the elderly. 

The correlation between the slopes derived by two investigators 
was 0.84 for total addition and 0.92 for correct addition in the high- 
school group. The corresponding correlations were 0.69 and 0.92 for 
the aged subjects. For each subject the slopes and intercepts derived 
by the two investigators were averaged. In the analysis of the data, 
the differences between the groups of subjects were thus secured by 
the mean of individual rates and not derived from a mean curve. 

Not all subjects were used in every comparison. In some in- 
stances the subjects were excluded when it was difficult to fit curves 
to the data because of excessive variability and missing values. In 
the aged and senile psychotic subjects the total output for long prob- 
lems per two minutes was in some instances so small as to result in 
large variability when considering correct addition. During group 
administration of the tests to high school students an occasional 
measurement would be missed. 


Results 


The data show a systematic relation between the rate of addi- 


























30 i T T T | T T T T | T T T T | T t i] T | T T T qT | 

25 4 
[=) = 

2 - 
ro) 4 4 
re 25... a 
xe © MEAN UNCORRECTED “ 
@ 5 ‘ o MEAN CORRECTED _ 
| — al 
Ss Ff 4 TONE o- UNCORRECTED 7 
< 10 8 = ~ 
n o£ $8, 3 
rw) ‘s - ° e 4 
5 05 a 
a o 2 
i} Cictiadiedind a a ee a a : 

5 10 15 20 25 

DIGITS IN PROBLEM 
FIGURE 1 


_ _ Addition rate as a function of problem length. Mean values of 193 senior 
high-school students age 16 to 20 years are plotted as the mean rate of total 
uncorrected addition and as mean rate of correct addition. 








224 PSYCHOMETRIKA 


tion and the length of the problem. However, it is apparent in Fig- 
ure 1 that this relation is not linear. The relation between rate of 
addition (not corrected for errors) and problem length was deter- 


A 
mined to be of the form Y = —; where Y is the rate of addition 


xX" 
in operations per unit time, A the initial rate or rate at a difficulty 
of one operation, n the slope, and X is the number of addition opera- 
tions per problem. 


TABLE 1 
Analysis of Variance of Addition Rate Using the 
Logarithm of Addition Operations 











Degrees of Sum of 
Source Freedom Squares Variance 
Total 2100 60.39 
Individuals 190 87.85 0.199 
Columns, total 10 13.39 1.339 
Linear regression 1 13.04 
Departure from 
linearity 9 0.35 0.039 
Residual 1900 9.15 0.005 





Columns: F = 1.339/0.005 = 267.80 
Curvilinearity: F = 0.039/0.005 = 7.80 


The goodness of fit (8) of the power function above was deter- 
mined for the data of the high-school students (Table 1). There is 
still some slight residual curvilinearity in the data. The equation is 
accepted as the best representation of the data, however, since the 
residual curvilinearity is a small portion of the total sum of squares 
and appears related to the limitation of writing speed in recording 
the answers to short problems. Small departures of the tests from 
“true” difficulty at a given level also may contribute to the slight 
curvilinearity. In short problems a larger portion of the time is 
spent writing answers so that the total rate of addition appears low- 
ered at the easy end of the difficulty scale. This is shown in Figure 2 
wherein the correlations between speed of writing and rate of addi- 
tion have been correlated for problems of different length. The cor- 
relations decline until a stable minimum is reached at problems of 
five digits or longer. Speed of writing was measured by having the 
subjects write digits as quickly as they could (1). The correlation 
between writing speed and rate of addition is lower at every level 








¢ 


JAMES E. BIRREN AND JACK BOTWINICK 225 


for the high-school students than for the sixty-year-old subjects. The 
suppression of addition rate due to writing speed is apparent in the 
curves of individual subjects. In Figure 3 the curve of a young adult 
illustrates the departure from linearity that may occur at low diffi- 


culty. 


























| | | 
08 F- CORRELATION ( r*)OF COPY AND ADDING SPEED 4 
- ° © 60-69; N=45 a 
© 16-20; N= 137 
06 . ” “| 
° 8 
e ® 
05 — % @ — 
a 
04 L a 
“i. * al 
° 
02 a 
° 
0.1 ° ss ° * ' : «< 
0 i ay Se a oe a ae 


DIGITS IN PROBLEM 
FIGURE 2 


Correlation (r2) between speed of writing digits and the rate of addition 
for problems of different lengths. Values are based on 187 senior high-school 
students and on 45 normal elderly subjects. 


The dashed line represents an extrapolation of the linear portion 
of the curve. The Y value when x = 1 would represent the theoreti- 
cal maximum function for this subject, i.e., the rate of addition at 
a difficulty level of one operation. By graphing an individual sub- 
ject’s performance, the effect of writing speed can thus be excluded 
from the estimation of the slope and intercept values. 

Although the mean curves are only crude representations of the 
data they suggest that the elderly and the senile subjects show a dif- 











226 PSYCHOMETRIKA 
Oo | te @ Oe eee t a3 = 
S RATE OF ADDITION 
* SUBJECT B.B. AGE 23 
WY) 
40K of 
a ia 
ae 3.0 a nig wail 
W) Zor fi oan 
3 2.0 ie is 7 
‘ “Eo e 
> 15 - e ® » — 
w 7 
Oo LO : _ 
09 F a 
08 a 
| | ewer ee eee | | 











! | 2 3 456 8 0 5 2025 
NUMBER OF OPERATIONS 


FIGURE 3 
Rate of addition as a function of difficulty in a 23 year-old subject. Depar- 
ture of linearity at low difficulty represents the influence of speed of writing the 
answers. Extrapolation of the curve yields an estimate of maximum perform- 
ance unaffected by this disproportionate suppression of addition rate. Subject 
was given the nroblems in random order. 


ferent level or rate of addition than the high-school students (Figure 
4). The exponents of the equations, however, are not significantly 
different for the three groups. This was demonstrated by comparing 
the slopes of the individual plots of rate of total addition on log log 
graph paper (Table 2). 

Differences in the performance of the young and aged subjects 
are more striking in the rate of correct addition. The senile patients 
appear to decline most rapidly in the rate of correct addition (Figure 
5). Thus the senile person not only works more slowly but also less 
accurately when problem length is increased. 

When the rate of addition is corrected for errors in adding, a 
different equation is required to describe the data. It was found in 
the individual plots that a linear fit was obtained between the log log 
rate of correct addition and the number of operations per problem. 
This indicates that there is an exponential relation between the rate 
of correct addition and problem length of the form Y = C*/™; where 








JAMES E. BIRREN AND JACK BOTWINICK 227 














TABLE 2 
Addition Rate Parameters in Three Classes of Subjects 
High-School Normal Senile 
Students elderly patients 
N= 184 N= 46 N=26 
Total Mean log rate 
StaS1 1.12 0.98 0.85 
o 0.14 0.18 0.22 
Addition Mean slope 
A log rate : 
- 0.30 0.27 0.385 
A log operations 
o 0.11 0.10 0.22 
Correct Mean log log 
rate at X —0 1.04 0.96 0.82 
o 0.05 0.12 0.26 
Addition Mean slope 
A log log rate 
———_—_——— 0.021 0.023 0.044 
A operations 
o 0.013 0.016 0.036 
Mean operations 
aby =1 68.0 64.9 40.0 
o 46.0 44,1 48.0 





Extreme skewing is present in some of the measures so the mean and ¢o 
values are only rough indications of the differences in the classes of subjects. 
The X value for Y = 1 in rate of total addition was not computed since the 
groups did not differ in the slope of total addition. 


All original values for total addition were multiplied by 10 and all correct 
addition rates by 100. In securing the antilogarithms of the above values the 
appropriate divisions should be made. 


Y = rate of correct addition. On the log log graphs, log a = ordinate 
intercept, — log m = slope, x = number of operations, and C = base 
of common logarithms. When the slopes of the individual plots were 
obtained, significant differences were found between the groups 
(Table 2). A larger difference was noted between the senile and the 
elderly subjects than between the elderly and the high-school stu- 
dents. 

If the X intercept on the log log graph is computed by dividing 
the Y intercept by the slope for each subject, a value is derived which 
represents the functional limit for the subjects, i.e., the longest prob- 
Jem that the subject may be expected to solve correctly in unit time 
(Table 2). This value reveals large differences between the subjects 
and suggest that the senile patients have suffered a larger per cent 


2. 2. oo 


Hy 


we 
e 








228 PSYCHOMETRIKA 


























SZECURREEPEREL PED Ee Ee eee 
a MEAN ADDING SPEED 4 
|e UNCORRECTED J 
50 L_ AGE. 16-20 @HIGH SCHOOL N=i91] __| 
60-69 # NORMAL N= 46 “= 
m a 60-70 aPSYCHOTIC N=33 | 7— 
9 a a 
O — x 
O a . 
> isth « ai 
mw oS a: 
LJ a wl 
a. | 8 ad 
WY) 
= “ Ce ae 
Oo lore wie 
oO a gE Se an 
a i. See wa 
= as . > wale 
= ® . = 
ae e <a 
4 a*aa , ai 
= A 4 — 
Oo 
Terr oe eer ee ree nrrn es ew eam 
5 10 I5 20 25 
DIGITS IN PROBLEM 


FIGURE 4 


Rate of total uncorrected addition as a function of problem length for three 
classes of subjects: 193 senior high-school students, 46 normal elderly subjects 
aged 60-69 years, and 33 senile psychotic patients aged 60-70 years. 


loss of function than the normal elderly. These values may have bio- 
logical validity in that they demonstrate that the difference in func- 
tion between the senile mental patients and the normal individuals 
of the same age is greater than the difference between the young 
and elderly normal subjects. 








JAMES E. BIRREN AND JACK BOTWINICK 229 


MEDIAN PERCENT CORRECT ADDITION 


FREESE RPS EPPA ETS EPL P LESTE 
l00}— = 





0 


80 F— 


uo aD 
Oo Oo 


| 


PERCENT CORRECT 





Ww 
Oo 





@ 16-20 STUDENTS N=93 
® 60-69 NORMAL N=50 
460-70 SENILE N=33 


Mw 
oO 

















Ls cilsisachsatlasalisdimctiiaalicilca cilia distill 


0 5 10 I5 20 25 
DIGITS IN PROBLEM 


FIGURE 5 
Per cent correct addition as a function of problem length. Median values 
are given for 193 senior high-school students, 50 normal subjects aged 60-69 
years and 33 senile mental patients age 60-70 years. Median values were pre- 
ferred to means because of the frequency of zero correct performances for diffi- 
cult problems in the aged subjects. 





Discussion 
The present study has undoubtedly oversimplified many rela- 
tions and perhaps ignored certain relevant variables. The results 
clearly suggest, however, that there is value in the approach. The 
present results provide an empirical background as well as a con- 
ceptual framework for future modification. There are several fea- 
tures of these results that have important implications: (1) the 








230 PSYCHOMETRIKA 


demonstration that rate of addition may be treated as a psychophysi- 
cal function, (2) the finding that the rate of total and correct addi- 
tion and the decline of correct addition with increased problem length 
show differences between young and aged subjects, and (3) the ex- 
trapolation of the individual curve permits the extraction of a value 
representing the subject’s theoretical maximum limit. 

In the discussion of these results it seems desirable to designate 
the intercepts and slopes in a way to suggest their conceptual origin. 
If one has rectified the data and obtained a linear relation as in the 
case of addition rate, it is simple to derive the parameters of the 
function. These parameters may be imparted a psychological inter- 
pretation. Thus the Y value representing the rate of addition for 
problems of zero or unit length might be called the functional mazxi- 
mum, FM. The X value or problem of maximum length where the 
rate of addition is zero or unity might be called the functional limit, 
FL. The slope representing the decline in rate with increased diffi- 
culty might be called the functional decline, FD. The area under the 
curve represents work completed during the test session and might 
be called the functional capacity, FC. This value was not computed 
in the present study because in deriving the expression the FM would 
have to be squared. Thus errors are exaggerated and unless one has 
considerable confidence that this intercept is accurate it would be 
best to use the FL as the index of the person’s function in comparing 
him with other individuals. 

The slope or FD is presumably free from the effects of individ- 
ual differences in motivation, criterion of acceptable accuracy, and 
speed of visual perception and motor response so long as the subject 
maintains these at a constant level throughout the test series. All 
other measures, the FM, FL, and the FC are influenced by individual 
differences in such constants of performance. This would suggest 
that the FD of the correct addition is more desirable in comparing 
subjects of widely different ages and background in whom there might 
be large individual differences in level of motivation. Any transitory 
states such as warm-up, fatigue, and practice within the test session 
would influence the slope function, FD. In view of the early work 
(11), however, it would seem that much longer test sessions than 
used in this study would be required to elicit sizable effects from such 
variables. 

Whether the results of the present study would apply to tests 
of double or triple the lengths used in this study is of course not 
known. It is possible that the functional relation between problem 








JAMES E. BIRREN AND JACK BOTWINICK 231 


length and rate of addition would show some deviation from the 
equations given here for problems of extreme length. The use of 
problems of a length longer than 25 digits is prohibitive for most of 
the senile patients and many of the normal elderly. It takes the aged 
so long to do a problem of 25 digits that it is very difficult to obtain 
a reliable estimate of their rate of correct addition with paper-and- 
pencil methods. With this in mind plus the complication of writing 
speed for short problems, it seems that the optimum length of prob- 
lems for computing the slope and intercept values for practical ap- 
plications lie between 5 and 15 digits. 

The individual differences in the parameters of rate and level 
isolated in this study give promise of conceptual value in analyzing 
the mental performance of different classes of subjects. 


Summary and Conclusions 


1. The purpose of this study was to examine the relation be- 
tween the rate of simple addition and the length of the digit series 
to be added. The hypothesis tested was that the rate of addition 
would decline in a systematic manner as a function of the logarithm 
of the number of addition operations per problem. Addition prob- 
lems were prepared that consisted of random digits arranged in sin- 
gle columns that varied in length from 2 to 10, 15, and 25 digits. The 
task required the subject to add the digits as quickly as possible. 
Three groups of subjects varying in age and mental status were 
used: (A) 193 senior high-school students, (B) 50 subjects aged 60 
to 69 years of age, and (C) 383 patients institutionalized for senile 
psychoses. All subjects were native-born white. The elderly subjects 
and the senile mental patients were matched for age and education. 

2. The rate of addition was found to be a function of the length 
of problem as measured by the number of operations. The equation 
expressing this relation for rate of total addition is of the form 


A 
Y= io The equation expressing this relation for rate of correct 


addition is of the form Y = C4’, In each case Y is the addition 
rate in operations per unit time and X the number of operations in 
the problem. 

3. Significant differences were found in the general rate of 
addition, or value of Y at lowest difficulty, for the three groups of 
subjects. No difference was noted in the slope of total addition. How- 
ever, significant differences were found between the groups of sub- 
jects in slopes of correct addition. When the Y intercept of correct 














232 PSYCHOMETRIKA 


addition is divided by the slope to secure the limiting value of X the 
differences between the groups were accentuated indicating that the 
senile subjects not only add more slowly but their accuracy drops 
disproportionately when difficulty is increased. The longest problem 
they would be expected to do correctly in infinite time was markedly 
decreased. Because of the results obtained, it is suggested that the 
parameters derived for each subject be designated in such a way as 
to suggest their conceptual origin. Thus the value of Y at X = 0 or 
unity is defined as the functional maximum, FM; the slope as the 
functional decline, FD; the value of X at Y = 0 or unity as the func- 
tional limit, FL. In addition the area under the curve may be deter- 


mined and defined as the functional capacity, FC. 

4. It has been demonstrated there is a uniform quantitative 
relation between the rate of addition and the length of the problem to 
be added. By appropriate graphing of the individual results, esti- 
mates of the slope and intercept parameters may be obtained. These 
parameters appear useful in analyzing the performance of different 
classes of subjects. 

REFERENCES 
1. Birren, J. E., and Botwinick, J. The relation of writing speed to age and 
to the senile psychoses. J. consult, Psychol., 1951, 15, (In Press). 
2. Coombs, C. H. The concepts of reliability and homogeneity. Educ. psychol. 

Meas., 1950, 10, 48-56. 

8. Guilford, J. P. The psychophysics of mental test difficulty. Psychometrika, 

1987, 2, 121-133. 

4, Guilford, J. P. The difficulty of a test and its factor composition. Psycho- 

metrika, 1941, 6, 67-77. 

5. Landahl, H. D. Time scores and factor analysis. Psychometrika, 1940, 5, 

67-74, 

6. Oehrn, A. Experimentelle Studien zur Individualpsychologie. Psychol. 

Arbeit., 1895, 1, 92-151. 

7. Richardson, M. W. The relation between the difficulty and the differential 

validity of a test. Psychometrika, 1936, 1, 38-49. 

8. Snedecor, G. W. Statistical methods. Ames, Iowa: Iowa State College 

Press, 1950; pp. 374-399. 

9. Tate, M. W. Individual differences in speed of response in mental test ma- 

terials of varying degrees of difficulty. Educ. psychol. Meas., 1948, 8, 353- 

374, 

10. Thurstone, L. L. Ability, motivation, and speed. Psychometrika, 1937, 2, 

249-254. 

11. Vogt, R. Ueber Ablenkbarkeit und Gewéhnungafihigkeit. Psychol. Arbeit., 

1899, 3, 62-201. 

12. Whipple, G. M. Manual of mental and physical tests: Part II. Baltimore: 

Warwick & York, 1915; pp. 460-485. 

Manuscript received 9/27/50 
Revised Manuscript received 11/18/50 











PSYCHOMETRIKA—VOL. 16, NO. 2 
JUNE, 1951 


A MECHANICAL MODEL ILLUSTRATING THE SCATTER 
DIAGRAM WITH OBLIQUE TEST VECTORS 


HAROLD GULLIKSEN 
AND 


LEDYARD R TUCKER 
PRINCETON UNIVERSITY AND EDUCATIONAL TESTING SERVICE 


A mechanical model is described for illustrating changes in the 
configuration of points when the reference axes are rotated oblique- 
ly and the values of the orthogonal projections of the points on the 
reference axes are maintained constant. This type of transforma- 
tion is important in factor analysis. 


Two different methods of graphic representation are in general 
use in correlation analysis and factor analysis (1, 2). Since the re- 
lations and contrasts between these geometric structures are some- 
times rather difficult for students to grasp, a mechanical model was 
devised to facilitate class explanations. This device is to be described. 

In the first of the graphic representations, each test is taken as 
an orthogonal coordinate axis and each persen is represented by a 
point with coordinates equal to his scores on the tests. When there 
are two tests this results in the conventional scatter diagram. Corre- 
lation between the tests results in an elliptical type scatter of points. 

In the second method of representation, uncorrelated factors or 
components are taken as the orthogonal coordinate axes. In the “fac- 
tor space” both tests and persons may be represented simultaneously 
as follows: 


(a) each test, by a vector from the origin with the coordinates 
of its terminus equal to the test’s factor loadings; 


(b) each person, by a point with coordinates equal to his fae- 
tor scores. 


This “factor space,” containing a vector for each test and a point 
for each person, has several interesting properties. (1) The test score 
of each person may be represented by the perpendicular projection 
of his point on the test vector. (2) There is a circular, or spherical, 
configuration of the peoples’ points. (3) Correlation between two 


233 








234 PSYCHOMETRIKA 


tests is indicated by the cosine of the angle between the two test 
vectors. 

The type of transformation between these two systems of geo- 
metric representations can be contrasted with the more common 
“rigid rotation” of axes in which the configuration of points remains 
constant while the axes are rotated; hence the projections of the 
points on the axes change. This is a familiar type of transformation 
which appears in many problems other than those of factor analysis. 
The change in projections can be represented by a matrix equation. 
On the other hand, in the type of transformation which we wish to 
illustrate in this paper there is a change in the angle between axes 
or vectors representing tests while the configuration of points 
changes in such a manner that the projections of the points on the 
test vectors are held constant.* The advantage in factor analysis is 
that the component elements in the space can be made to take on dif- 
ferent significances after this type of geometric transformation. 

An example in two-space of the results for the second type of 
transformation is given in Figure 1. We begin with an initial con- 
figuration in which the axes are at 90° separation. The plotted points 
are assumed to have a positive correlation and hence form an ellipse. 
If the coordinates of the points are standard scores of individuals on 
two tests, the correlation between tests is related to the ratio of the 
principal axes of this ellipse. Now let us consider a transformation 
such that the coordinate axes are separated by an angle 6. The test 
scores are still plotted as projections perpendicular to these new axes. 
An angle @ can be so chosen that the points will form a circle. For 
this geometric representation the correlation between the two tests 
is now represented by the cosine of 6. This second configuration is 
illustrated at the right in Figure 1. The projections of point 7 are 
illustrated for each configuration. It is to be noted that the projec- 


*Professor Ernst Snapper (Visiting Associate Professor from Southern Cali- 
fornia, now at the Department of Mathematics, Princeton University) suggested 
that the change in coordinates of the points on a fixed pair of orthogonal axes 
can be described as a combination of a stretch transformation and a shear trans- 
formation, both parallel to the y-axis. The x coordinates remain constant while 
the y coordinates undergo the following two transformations: 

1 
= y (stretch transformation) 
cos 8 





y" =y'—(tan@)x (shear transformation) 


where @ is the angle of rotation of the axis representing test y. The combined 
transformation is: 





O am 1 t r] 
_—— cos 8 sana diet 








HAROLD GULLIKSEN AND LEDYARD R TUCKER 


7 








235 


tions on the coordinate axes are identical in the two configurations; 
however, the point 7 has moved. 








x 


wie 





Siaccucl 








3 


| fe 















































































































































236 PSYCHOMETRIKA 


This type of transformation is a homogeneous linear non-singu- 
lar transformation of axes; e.g., a projective transformation of axes. 
Simultaneously the points in question undergo a coordinate-preserv- 
ing transformation. 

Since students have considerable difficulty in grasping the ideas 
involved in this second type of transformation, a mechanical device 
was constructed to illustrate this transformation in a two-space. 

The device consists of two boards with slots in them as shown 
in Figure 2. The bottom board was made of 1/8” bakelite, 16” < 20” 
(24” would be better). The top board was smaller. A 14” X 18” piece 
of 3/8” lucite was used. The boards were pivoted to each other by a 
bolt at the origin labelled “O”’. The pivot point was reinforced by a 
1 1/4” metal disk as indicated in the figure. Twenty slots each 3/16” 
wide and 16” long were milled in the boards. These slots were 5/8” 
apart (center to center) leaving a 7/16” strip of material between 
the slots. 

Aluminum pegs, shown in Figure 3, can be inserted at any in- 
tersection of two slots, as indicated in Figure 4. These pegs were 
held in place by snap-in trimounts available from a radio supply 
house. 








I ie — 




















FIGURE 8 





=~ Th fae epxhre.. 








HOWARD W. GOHEEN AND MELVIN D. DAVIDOFF 





rcs seer 
ee 


228 31 81D - G1 SSSI SI Sil SS) 21 Gi G1 Sl as 
SE | BLD IS cH SIS SIS TGS Sl 8 Si Sie 


SALSISISIS SISISIAK:: Me" Sa a 


} j | | } 
i} | | } 
i | ; 
| | | H } 
} 


228151 1BIRIDIS SIR. ci SIGS SISISIB- Gi viae 
1510 1D ISIS IMIG ie iets. 6.46.6'S'S, Sele 


21818 1SISID. SG SISISIMIGISIS SISIR- <i si alee 
S25 | 81518 1S1G1D-C"GISIRI SIRI alaISID- ial al aes 
285 151818181810 .0.8S1Giel SISISIGI8. 2 aisles 


| 2 1 81D -G 1815 1S 1 MISSI S S- S181 Sal | Sl oe 
7 1818 18 18 18 AIS Glee S1s Sl Gl el aes 


FIGURE 4 


| { 
i 

a s 
ae See 
rea: —_— 





237 


tb 


“4 
pees 


' 
P. ot JA Ay! pe Paves * 4 ~ a 2 * He: +s . ote eS 
oe §&* ee ee Cee SOR eS Oe 


= 


a 
2 


Pee SY Scena ORI Mane te 


f 


The upper lucite board may be rotated from an initial angle of 
90 degrees with the lower board to an angle of about 35 degrees. 
This corresponds to a cosine or “correlation” of about .80. The pro- 
jection of each point on each axis remains invariant for any rota- 
tion but the inter-point distances, or the configuration of points 
changes as shown in Figure 5. Thus the “correlation” shown by the 
set of points will be a function of both the original configuration of 


points and the angle between the axes. 








238 PSYCHOMETRIKA 





FIGURE 5 


With this device any configuration of points can be set up with 
orthogonal axes and rotated to —35 or + 35 degrees corresponding 
approximately to correlations of —.8 to +.8. Beyond this rotation, 
the pegs will tend to jam. This device has furnished a very effective 
means for demonstrating the effects of transformation from orthogo- 
nal to oblique axes for students of factor analysis. 


REFERENCES 
1. Thomson, Godfrey H. The factorial analysis of human ability. New York: 


Houghton Mifflin Co., 1946. 
2. Thurstone, L. L. Multiple-factor analysis. Chicago: The University of Chi- 


cago Press, 1947. 


Manuscript received 5/30/50 
Revised manuscript received 12/15/50 

















PSYCHOMETRIKA—VOL. 16, NO. 2 
JUNE, 1951 


A GRAPHICAL METHOD FOR THE RAPID CALCULATION OF 
BISERIAL AND POINT BISERIAL CORRELATION 
IN TEST RESEARCH 


HOWARD W. GOHEEN AND MELVIN D. DAVIDOFF 
U. S. CIVIL SERVICE COMMISSION 


A description is given of a diagram (available separately) for 
computing biserial or point biserial correlation coefficients. The dia- 
gram is maximally useful where large numbers of coefficients are 
to be calculated in test item analysis. The diagram is entered with 
the mean criterion score of the group passing the item and the pro- 
portion of correct answers to the item. 


Current emphasis on test analysis procedure has led to an in- 
creased application of biserial and point biserial correlation meth- 
ods in test-item analysis to which these techniques are particularly 
adaptable. The nomograph presented on a following page permits 
particularly rapid calculation of these coefficients with accuracy to 
the second decimal.* When IBM equipment is available for the de- 
termination of item means and proportions, this nomograph repre- 
sents the shortest method for calculation of these coefficients known 
to the authors. It is maximally useful where large numbers of co- 
efficients are to be calculated for individual tests. Its application to 
the calculation of single coefficients from particular data represents 
little if any advantage over other methods. 


Given the mean (X ) and standard deviation (c) of the total 
group on the criterion by which the test items are to be evaluated, 
the only data necessary for determining either of the two coefficients 
for any item are the proportion of correct answers to that item (p) 
and the mean criterion score (M) of those who answered the item 
correctly. 

The ordinate markings on the chart are in symbol form so that 
it can be adapted for use on any particular test. Before using the 
chart the appropriate values should be entered as indicated along 


*The diagram is published in reduced size for illustrative purposes only. 
Interested readers may obtain a copy of the diagram, size 103 < 16 inches, with- 
out charge by writing to the Test Development Section, United States Civil Serv- 
ice . Washington 25, D. C., requesting a copy of Test Technical Se- 
ries No.17. 


239 











240 PSYCHOMETRIKA 


the ordinate of the chart. (Note that these ordinate values will be 
identical when the chart is turned for computation of biserial r’s or 


for point biserial r’s.) If, for example, X = 65 and o = 9, these val- 
ues would read: 


X + 40 = 68.6 
X + 30 = 67.7 
X + 20 = 66.8 
X + lo = 65.9 
} <— 65 


It will be noted that the ordinate entries simply increase .lo for 
each value. 


Example: 
The mean criterion score for the total group (X) is found to be 
65; the standard deviation (c) of this distribution is 9. Now let 


ee PTE =the proportion answering items 
1 through 7 correctly shin 
M,, M.,,---- M; =the mean criterion score of 


those answering items 1 
through 7 correctly 


Suppose that you want the biserial coefficient for item 5 and 
that 20% of the sample got it right (py; = .20), and the mean cri- 
terion score of those who got it right was 72 (M; = 72). Find the 
intersect of 72 on the ordinate and .20 on the abscissa. This yields 
a biserial r of .55. If the data are such that the point coefficient is 
the more appropriate, rotate the chart 180°, and read at the inter- 
sect as before. For the example given, the point biserial value is .39. 

It is apparent that when the mean criterion score value of those 
passing the item is less than the mean of the total sample the sign 
of the coefficient will be negative. In such cases it is necessary only 
to obtain the difference between the mean of the passing group (M) 


and the mean of the total (X) and add this difference to the total 


mean. This can be done by the formula 2X—M. Enter the chart as 
before, affixing a negative sign to the obtained coefficient. 


Example: 
Item six yields the following data: p, = .25;M,=57;2X —M= 
73. At the intersect of 73 and .25 the biserial value is .70, the point 


















‘WALI ONISSVd dNOYD TWLOL JO NOILUOSONd 


ie 200 


Xo 40 


Ko 3ee 
4N312144302 NOILV13¥N02 
Wiy3S-18 LNIOd 


Kot2ee 3HL YOs WYYDVIO ONILNEWOD 


Kite 


Rin 


Ke 90 = 
COMPUTING DIAGRAM FOR THE 
BI-SERIAL 
Sie CORRELATION COEFFICIENT 
seer 


Xe7es 


MEAN CRITERION SCORE CF PASSING GROUP 


K+ 60s 


Ks Ses 


Xe4es 


Ke.3e* 


Re.2re 


Rees 


& 
‘ 0S 0 15 20 25 30 35 40 45 5O 55 60 65 70 75 80 85 90 96 1.00 


PROPORTION OF TOTAL GROUP PASSING ITEM 








HAROLD GULLIKSEN AND LEDYARD R TUCKER 


saz1°K 


sOtIX 





AND N.D. DAVIDOFF 


oern 


§ 
+ F 


: 


INOUD INISSVdA IO FHOIS NOMMILIND NUIN 


2o1e% 


aH itK 


saviex 


9S I°x 


99 1tx 


2 OLX 


81K 


61x 











242 PSYCHOMETRIKA 


biserial value is .51. These signs must be reversed to read —.70 and 
—.51 respectively. 

If, however, the item-analysis data are such that any appreci- 
able number of negative values is anticipated, a second scale can be 
prepared for the ordinates in the same fashion as indicated above 


except with reversed sign, i.e., ascending from X thus: X—.1e, X—.20 . 


X—.30 , etc. 

The method presented here requires fewer constants than other 
methods of determining biserial values and has the very distinct ad- 
vantage of eliminating the actual computation of the coefficient. The 
usual cautions on the interpretations of serial correlation coefficients 
in test item analysis are of course in order. 


Manuscript received 10/19/50 























PSYCHOMETRIKA—VOL. 16, NO. 2 
JUNE, 1951 


BOOK REVIEWS 


GEORGE KINGSLEY ZIPF. Human Behavior and the Principle of Least Ef- 
fort. Cambridge: Addison-Wesley Press, Inc., 1949. Pp. xi + 578. 


This book touches “economics, sociology, cultural anthropology, psychology 
—both general and Freudian—linguistics, and semantics” by the author’s own 
admission. The author’s thesis is that, in order to survive, an organism must 
select a course of action which will minimize its present work plus its probable 
work of the future. This is referred to as the principle of least effort. Zipf 
applies this principle to virtually all behavior of all organisms. In the linguistic 
area, he considers such items as frequency and rank order of words, word mean- 
ings and rank, frequency and number of words, and repetition interval and num- 
ber of words, in connection with samples of the speech and written products of 
children, adults, and psychotics, in many different languages. The sizes of cities, 
number of retail stores, factories, service and business establishments, news 
items in daily papers, obituaries in the Times, marriage licenses issued in Phila- 
delphia as a function of the blocks separating the couples, and the distribution 
communities in different nations all lend support for the least effort principle 
according to Zipf. 

Many interesting empirical curves are presented, and the author shows great 
originality on every page; but the reviewer, after long and painful effort, could 
not follow the author’s logic or mathematics in relating the curves to the hy- 
pothesis. 

At times it is extremely difficult to take Zipf seriously. For example, the 
author can make procreation fit into the least effort doctrine only by assuming 
an ego or “identity-point” which survives the organism. He admits that he 
doesn’t know whether “it” is or is not eternal, and if “it” survives death he 
doesn’t know what “it” is. But it leads him to the conclusion that there is 
always the same fixed number of organisms alive on the planet at any one time. 
The reviewer is aware of a number of lines of reasoning which have led men to 
posit an eternal soul, but this is the only soul, to his knowledge, which is required 
by virtue of the general fact of procreation. Another indication of Zipf’s line 
of attack is the following: ‘. . . For as soon as the father falls in love with a 
child and “wants” or “desires” it, he polarizes himself in the opposite direction 
sexually, and thereby theoretically sets up reactions to produce a child of the 
opposite sex, thus confirming the hoary superstition that the sex of one’s off- 
spring is the opposite from that desired by the parents,’ p. 262. Zipf accomplishes 
similar feats on every other page. 

Zipf has been publishing material of this sort since 1929. His present work 
represents the broadest and most detailed application of his approach. This re- 
viewer feels it cannot be taken seriously as a scientific effort. 


University of Wisconsin Dawid A. Grant 


243 











244 PSYCHOMETRIKA 


ALPHONSE CHAPANIS, WENDELL R. GARNER, CLIFFORD T. MORGAN. 
Applied Experimental Psychology. New York: John Wiley & Sons, Inc., 1949. 
Pp. xi + 434. 


Applied Experimental Psychology is an outgrowth of a series of lectures 
presented by the authors to engineering students at the Naval Post Graduate 
School, Annapolis, in the Spring of 1947. The lectures were subsequently printed 
as a classitied Navy publication and now in this form. 

The subject matter of the book has been referred to variously as human 
engineering, engineering psychology, biomechanics, applied psychophysics, psy- 
chotechnology, and systems research. Whatever its proper name, the discipline 
is that which was so stimulated by the recent war. It proceeds from the basic 
tenet that machines (and systems of men and machines) must be designed in 
terms of the human abilities and limitations of the operators. 

This book will be of great interest to engineers, technicians, and industrial 
designers; it will acquaint these people with the activities of psychologists; it 
gives concrete examples of the use of psychological research results in design 
work. Applied Experimental Psychology will do a good public relations job for 
professional psychology because of the authors’ serious effort, by and large suc- 
cessful, to describe research methods and results in terms that can be understood 
by non-psychologists. Their example of the grouping of production data into 
analysis of variance tables shows clearly, without going into theory or comput- 
ing formulas, the idea behind the technique. There is a commendable emphasis 
on methodology which should better explain the tools that the psychologist brings 
to bear on problems he tackles. 

Which is not to say that Applied Experimental Psychology is not an im- 
portant contribution to psychology itself. It is a first statement of the structure 
and content of this aspiring discipline. As such, this book must be considered 
as a text for a course which now must be an integral part of the curriculum of 
industrial psychology, and, it is to be hoped, for engineering as well. 

The text is divided into four sections: methodology, sensation and percep- 
tion, motor behavior, and the working environment. The authors do not imply 
by word or tone that this structure has been completely or finally filled in; they 
do, in fact, point out the gaps and areas of controversy. But in discussing the 
controversies, they are willing, as applied scientists, to render judgments for the 
benefit of those who are more concerned with positive suggestions than the reci- 
tation of theoretical conflicts. 

Not the least of the values of Applied Experimental Psychology is in setting 
the tone for effective employment of the applied scientist. It has been and still 
is difficult for the applied scientist, especially the psychologist, to find his most 
meaningful mode of activity. There are all too few examples of practicing psy- 
chologists who are able, in the competitive industrial world, to utilize in a prac- 
tical way the best of academic techniques and attitudes in finding the answers 
to industrial problems. This text illustrates the use of fundamental research 
techniques in a powerful way for the solution of pressing questions of the day. 
In prosecution of this argument, it must be said that Applied Experimental 
Psychology exposes a number of superficial treatments of data. Research on 
systems is an area where only the most cursory techniques have been developed. 
There is a challenge here for the psychometrician. Considerably ingenuity of the 
highest order is required of the applied scientist in discovering molar concepts 

















BOOK REVIEWS 245 


and measuring them under conditions which are not as convenient as the lab- 
oratory. Incomplete data and data which are biased by known variables can 
and must be dealt with by an applied psychologist in a competent way. The 
alternative, so frequently invoked, is the “quick and dirty” job, whose ulti- 
mate value for both the consumer and the applied scientist is questionable. 

Applied Experimental Psychology is, then, an estimable effort. It has im- 
mediate value for the industrial designer, the industrial psychologist, students 
of both industrial psychology and engineering, and for the profession of psy- 
chology at large. 


New York University Robert L. Chapman 


BOOKS RECEIVED 
CALVIN P. STONE (Ed.) Annual Review of Psychology, Vol. II. Stanford: An- 
nual Reviews, Inc., 1951. Pp. ix + 389. 
Tables d’ Interets et d’ Annuites. Brussels: Credit Communal De Beligique, 
1950. 








